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TITLE 

AN ENGINEERED SEED PROTEIN HAVING A fflGHER 
PERCENTAGE OF ESSENTIAL AMINO ACIDS 
FIELD OF THE INVENTION 
5 This invention pertains to the development of seeds and seed storage proteins that 

are enhanced in the quantity of amino acids that are essential to humans and animals, 
and more particularly, the enhancement of the quantity of essential amino acids in the 
Brazil Nut 2S albumin seed storage protein. 

BACKGROUND OF THE INVENTION 

10 Many vertebrates, including man, lack the ability to manufacture a nimiber of 

amino acids, and therefore, require these preformed in the diet. These are called 
essential amino acids. The two major sources of dietary protein in the US, com and 
soybeans, are deficient in some nutritionally indispensible (essential) amino acids. The 
amino acids essential for humans and most animals that must be acquired from dietary 

1 5 sources include the sulfur-containing residues methionine (Met) and cysteine (Cys), 
along with the basic amino acid lysine (Lys) and aromatic tryptophan (Trp). Soybean 
meal is a good source of Lys and Trp but poor in sulfiir-containing residues and thus 
must be supplemented with sulfur-rich com meal to provide a suitably balanced diet. A 
protein that has a substantial proportion of both the sulflir-amino acids and Lys in 

20 content that can be expressed to high levels in seeds of crop plants would have two 
advantages. First, the need to supplement meals with individual amino acids, or blend 
different meals would be obviated. Second, other meals that are left after extraction of 
other commodities and presently discarded for lack of nutrition, might become 
alternative sources of balanced dietary protein. With the molecular genetic tools now 

25 available, alteration of the amino acid composition of seed storage proteins to enhance 
their nutritional quality is possible. Such altered seed storage proteins can in turn 
enhance grain amino acid composition, thus adding value for the farmer. 

Efforts to improve the amino acid content of crops through plant breeding have 
resulted in only limited success and then only in the laboratory. A mutant com line with 

30 elevated whole kernel methionine concentration was isolated from cell culture after 

selecting for growth in inhibitory concentrations of lysine and threonine (Phillips et al., 
(1985) Cereal Chem. 52:213-218). Similarly, soybean cell lines with increased 
intracellular concentrations of methionine selected with ethionine have been reported 
(Madison and Thompson (1988) Plant Cell Reports 7, 472-476), but no plants were 

35 regenerated from these lines. 

The amino acid content of seeds is determined primarily by the storage proteins 
synthesized during seed development that serve as a major nutrient reserve following 
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geraiination. The quantity of this reserve varies from about 10% dry weight in cereals 
- to 40% in legumes. In some seeds, storage proteins can accoimt for 50% or more of the 
total protein. Although this abimdance has meant that these proteins were some of the 
first to be isolated, it is only recently that their amino acid sequences have been 
5 determined. A number of sulfur-rich plant seed storage protems have been identified 
and their genes isolated. A gene from com coding for a 15kDa zein protein containing 
1 1% melhionine and 5% cysteine (Pedersen et al., (1986) J. Biol. Chem. 261, 
6279-6284) and one coding for a lOkDa zein containing 23% methionine and 3% 
cysteine have been isolated (Kirihara et al., (1988) Mol. Gen. Genet. 21, 477-484; 

10 Kirihara et al.. Gene 71, 359-370), as well as another zein containing 37% methionine 
and 3% cysteine (Chui and Falco, Plant Physiol. 107:291, 1995). Two seed albumin 
genes from pea containing 8% and 16% cysteine have been reported (Higgms et al., 
(1986) J. Biol. Chem. 261, 1 1 124-1 1 130). The gene from Brazil Nut for a seed 2S 
albumin containing 18% methionine and 8% cysteine has been isolated ( Altenbach et 

15 al., (1987) Plant Mol. Biol. 8, 239-250). Finally, a gene from rice codes for a lOkDa 
seed prolamin that has 19% methionine and 10% cysteine (Masumura et al., (1989) 
Plant Mol. Biol. 12, 123-130). Combining the genetic signals controlling expression 
and targeting in the seed with an engineered storage protein that has enhanced amino 
acid composition, would provide an attractive means of altering seed protein quality. 

20 The use of natural variants of seed proteins rich in essential amino acids is a 

promising approach, but applicable only when natural variants rich in the desfred amino 
acid can be found. Few natural proteins with high lysine content have been identified. 
Proteins with combinations of essential amino acids are even less common, particularly 
ones with sufficiently high percentages of, for example, methionine and lysine so that 

25 the expression levels required to raise the level of those amino acids in seeds still 
expressing endogenous proteins are not beyond the limits of gene expression 
technology. Modem protein engineering technology offers a route to create such 
proteins. One solution is to design proteins completely de novo, such as taught by 
Falco et al. (World Patent Publication No. WO93/03160). This strategy is risky in that 

30 the fate of such a protein in the cell is difficult to predict. 

An altemative approach is to re-engineer a pre-existing storage protein already 
replete with at least one of the essential amino acids. The issues surrounding the 
expression of modified seed proteins have been reviewed (Krebbers et al. In: Transgenic 
Plants, A. Hiatt, ed. Dekker Inc., New York, 1993, pp 37-60). Briefly, such 

35 modifications must not disrapt the complex folding, processing, or intracellular 

transport processes that these proteins undergo; as exemplified by Hoffinan et al. (1989) 
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Plant Mol. Biol. 11, 717-729) when due care is not taken the resulting modified protein 
- risks being degraded. 

The Brazil Nut 2S albvimin represents a family of related proteins found in a 
variety of species (Youle and Huang (1981) Amer. J. Bot. 55:44-48). They are small 
5 proteins foimd in vivo as two subunits linked by disulfide bridges. The two subunits are 
derived from a single precursor peptide which is extensively processed (Crouch et al. 
(1983) J. Mol. App. Genet. 2, 273-283; Krebbers et al. (1988) Plant Physiol. 87, 
859-866). Sequence analysis of 2S albumins from different species shows that while 
the sequences from different species are not always highly conserved, the number of 

1 0 cysteine residues and their arrangement in the sequence is, suggesting that the structure 
of 2S albiraiins is similar between species. 

There are many reports detailing the expression of the 2S albumin from Brazil Nut 
in transgenic plants. For example, it has been expressed in the seeds of transformed 
tobacco under the control of the phaseolin promoter. The 17 kDa precursor form of the 

1 5 protein was correctly processed to the mature dimeric state composed of a 9 kDa and 
3 kDa subunit. The accumulation in the seeds of the tobacco resulted in an increase of 
about 30% in methionine in the seeds (Altenbach et al. (1989) Plant Mol. Biol. 13, 
513-522). With varying degrees of success the same protein has been expressed in 
Brassica, Arabidopsis, and Vicia narbonensis and soybean, both grain legumes 

20 (Altenbach, (1992) Plant Mol. Biol. 18, 235-245; Guerche et al., (1990) Mol. Gen. 
Genetics 221, 306-314; De Clercq et al. (1990) Plant Physiol 94, 970-979; Saalbach 
et al. (1994) Mol. Gen. Genetics 242, 226-236). Chimeric genes linking the coding 
regions of 19 and 23 kDa com storage proteins to Cauliflower Mosiac Virus 35S 
promoter were found to be expressed at low levels in seeds roots and leaves of 

25 transformed tobacco (Schemthaner et al., (1988) EMBO J. 7, 1249-1255). Replacement 
of the monocot regulatory regions with similar regions from dicots resulted in low level 
seed specific expression of a 19 kDa zein in transformed petunia (Williamson et al., 
(1988) Plant Physiol. 88, 1002-1007) and tobacco (Ohtani et al., (1991) Plant Mol. 
Biol. 16, 1 17-128). In another case, high level seed-specific expression of the 15 kDa 

30 sulfiir rich zein was found in transformed tobacco and the signal sequence of the 

monocot precursor was also correctly processed (Hoffinan et al., (1987) EMBO J. 6, 
3212-3221). 

Two bodies of work have previously demonstrated that limited changes can be 
made to the sequence of 2S albumins which result in stably accumulated modified 2S 
35 albumins in plant seeds. European Patent Publication No. EP-0-3 18,341, 

Vandekerckhove et al. (1989), Bio/Technology 7, 929-932) and De Clercq et al. (1990) 
Plant Physiol 94, 970-979) demonstrate that changes limited to the region between the 
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6th and 7th cysteine residues can be made. No changes beyond that region were made, 
- and the authors teach that changes elsewhere in the protein may disrupt the structure 
because of the conservation of distances between the other cysteine residues. The 
claims of Ballo (World Patent Publication No. WO 94/10315) have disclosed that 
5 replacing arginine by essential lysine of a putative 2S albumin sequence should be 
tolerated by the protein. Since both amino acids are basic residues, such a change was 
described as conservative. The details did not disclose whether the protein was able to 
fold or accumulate in plants. It was also disclosed that the replacements should not 
create any pairs of adjacent amino acids not found in the natural or homologous seed 

1 0 storage proteins. Because of this latter constraint, not all arginine residues in the 
putative protein were replaced with lysine. These two works thus teach that 2S 
albumins can be modified in strictly defmed ways, limited either to a particular region 
or conservative amino acid changes in specific positions. 

SUMMARY OF THE INVENTION 

1 5 This invention pertains to a modified Brazil Nut 2S albumin seed storage protein 

wherein: (i) the amino acid sequence of the modified protein is at least 40% 
homologous to the wild type Brazil Nut 2S albumin seed storage protein; (ii) all 
cysteine residues of the modified protein are conserved relative to the wild type protein; 
(iii) at least 40% of proline residues are conserved relative to the wild type protein; 

20 (iv) at least 80% of leucine residues are conserved relative to the wild type protein; and 
(v) the modified protein comprises at least one non-conservative amino acid substitution 
not within the hypervariable loop, the substitution consisting of replacement of a non- 
essential amino acid with an essential amino acid. 

A preferred embodiment of the instant iavention is a modified Brazil Nut 2S 

25 albumin seed storage protein wherein: (i) the amino acid sequence of the modified 

protein is at least 82% homologous to the wild type Brazil Nut 2S albumin seed storage 
protein; (ii) all cysteine, proline, leucine and methionine residues of the modified 
protein are conserved relative to the wild type protein; (iii) all arginine residues of the 
wild type protein are substituted with lysiue residues; and (iv) the modified protein 

30 comprises at least three non-conservative amino acid substitutions not within the 
hypervariable loop, said substitutions comprising substituting two glutamic acid 
residues with lysine residues and substituting one glutamine residue with a lysme 
residue. 

Another embodiment of the instant invention is an isolated nucleic acid firagment 
35 encodmg the modified Brazil Nut 2S albumin seed storage protein described above, and 
a chimeric gene wherein Ihe nucleic acid fi-agment operably linked to suitable regulatory 
sequences. 
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A further embodiment of the instant invention is a transformed plant comprising 
. m its genome the chimeric gene described above. Preferred plants include soybean, 
rapeseed, sunflower, cotton, com, tobacco, alfalfa, wheat, barley, oats, sorghum, rice 
and forage grasses. 

5 Yet another embodiment of the instant invention are seeds derived from the 

transformed plant described above wherein the seeds comprise the chimeric gene. 

Still another embodiment of the instant invention is a method for increasing the 
essential amino acid content of seeds, the method comprising: (a) preparing a nucleic 
acid fragment encoding a modified Brazil Nut 2S albumin seed storage protein wherein 
10 (i) the amino acid sequence of the modified protein is at least 40% homologous to the 
wild type Brazil Nut 2S albumin seed storage protein; (ii) all cysteine residues of the 
modified protein are conserved relative to the wild type protein; (iii) at least 40% of 
proline residues are conserved relative to the wild type protein; (iv) at least 80% of 
leucine residues are conserved relative to the wild type protein; and (v) the modified 
1 5 protein comprises at least one non-conservative amino acid substitution not within the 
hypervariable loop, the substitution consisting of replacement of a non-essential amino 
acid with an essential amino acid; (b) preparing a chimeric gene comprising the nucleic 
acid fragment of step (a) operably linked to suitable regulatory sequences; 
(c) transforming a plant with the chimeric gene of step (b); and (d) obtaining seeds from 
20 the transformed plant of step (c). 

BRIEF DESCRIPTION OF THE DRAWINGS 
AND SEOUENCE DESCRIPTIONS 
The Sequence Descriptions contain the one letter code for nucleotide sequence 
characters and the three letter codes for amino acids as defined in conformity with the 
25 lUPAC-IYUB standards described in Nucleic Acids Research 1 3, 302 1 -3030 (1 985) and 
in the BiochemicalJoumal 219(2), 345-373 (1984) which are incorporated by reference 
herein. 

Figure 1 is the nucleotide sequence and deduced amino acid sequence of the wild 
type Brazil Nut 2S albumin gene in plasmid pBNwt that was used as the starting point 
30 for the genetic modifications described herein. Relevant restriction enzyme cleavage 
sites are indicated. 

Figure 2 is the nucleotide sequence and deduced amino acid sequence of the 
modified Brazil Nut 2S albumin gene BNCNSS. 

Figure 3 is the nucleotide sequence and deduced amino acid sequence of the 
35 modified Brazil Nut 2S albumin gene BNl 1 . 

Figure 4 is the nucleotide sequence and deduced amino acid sequence of the 
modified Brazil Nut 2S albumm gene BNl 5. 
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Figure 5 is the nucleotide sequence and deduced amino acid sequence of the 
- modified Brazil Nut 2S albumin gene BN17. 

Figure 6 is the nucleotide sequence and deduced amino acid sequence of the 
modified Brazil Nut 2S albumin gene BN18. 
5 Figure 7 is the nucleotide sequence and deduced amino acid sequence of the 

modified Brazil Nut 2S albimiin gene BN19. 

Figure 8 is the nucleotide sequence and deduced amino acid sequence of the 
modified Brazil Nut 2S albumin gene BN153KW. 

Figure 9 is a composite of the amino acid sequences encoded by the wild type 
1 0 Brazil Nut 2S albumin gene (wt) and modified Brazil Nut 2S albumin genes 

exemplified herein. Sulfur-containing amino acid residues are indicated in bold. 

Figure 10 is the nucleotide sequence and deduced amino acid sequence of the 
modified Brazil Nut 2S albumin gene AT2S1BN15. 

Figure 1 1 is the nucleotide sequence and deduced amino acid sequence of the 
15 modified Brazil Nut 2S albumin gene AT2S1BN19. 

Figure 12 is the nucleotide sequence and deduced amino acid sequence of the 
modified Brazil Nut 2S albumin gene AT2S1BN153W. 

SEQ ID NO: 1 is the nucleotide sequence and deduced amino acid sequence of the 
wild type Brazil Nut 2S albumin gene in plasmid pBNwt that was used as the starting 
20 point for the genetic modifications described herein. 

SEQ ID N0:2 is the amino acid sequence of the wild type Brazil Nut 2S albimiin 
protein. 

SEQ ID NOs:3-6 are four synthetic oligonucleotides used in the construction of 
the modified Brazil Nut 2S albumin gene BNCNSS. 
25 SEQ ID N0:7 is the nucleotide sequence and deduced amino acid sequence of the 

modified Brazil Nut 2S albumin gene BNCNSS. 

SEQ ID Nos:8 and 9 are two synthetic oligonucleotide used in the construction of 
the modified Brazil Nut 2S albumin gene BNl 1 . 

SEQ ID NO: 10 is the nucleotide sequence and deduced amino acid sequence of 
30 the modified Brazil Nut 2S albumin gene BNl 1 . 

SEQ ID NOs:l 1-14 are four synthetic oligonucleotides used in the constiuction of 
the modified Brazil Nut 2S albumin gene BNl 5. 

SEQ ID NO: 15 is the nucleotide sequence and deduced amino acid sequence of 
the modified Brazil Nut 2S albumin gene BNl 5. 
35 SEQ ID Nos: 1 6 and 1 7 are two synthetic oligonucleotides used in the construction 

of the modified Brazil Nut 2S albumin gene BNl 7. 
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SEQ ID NO: 18 is the nucleotide sequence and deduced amino acid sequence of 
. the modified Brazil Nut 2S albumin gene BN17. 

SEQ ID Nos:19 and 20 are two synthetic oligonucleotides used in the construction 
of the modified Brazil Nut 2S albumin gene BN18. 
5 SEQ ID N0:21 is the nucleotide sequence and deduced amino acid sequence of 

the modified Brazil Nut 2S albumin gene BN18. 

SEQ ID Nos:22 and 23 are two synthetic oligonucleotides used in the construction 
of the modified Brazil Nut 2S albumin gene BN19. 

SEQ ID NO:24 is the nucleotide sequence and deduced amino acid sequence of 
10 the modified Brazil Nut 2S albumin gene BN19. 

SEQ ID Nos:25-28 are four synthetic oligonucleotides used in the construction of 
the modified Brazil Nut 2S albumin gene BN153KW. 

SEQ ID NO:29 is the nucleotide sequence and deduced amino acid sequence of 
the modified Brazil Nut 2S albumin gene BN153KW. 
1 5 SEQ ID Nos:30 and 3 1 are two synthetic oligonucleotides used in the construction 

of the Brazil Nut 2S albumin genes comprising the Arabidopsis 2S albumin precursor 
sequence. 

SEQ ID NO:32 is the nucleotide sequence and deduced amino acid sequence of 
the modified Brazil Nut 2S albumin gene AT2S1BN15. 
20 SEQ ID NO:33 is the nucleotide sequence and deduced amino acid sequence of 

the modified Brazil Nut 2S albumin gene AT2S1BN19. 

SEQ ID Nos:34 and 35 are two synthetic oligonucleotides used in the construction 
of tiie modified Brazil Nut 2S albumin gene AT2S1BN153W. 

SEQ ID NO:36 is the nucleotide sequence and deduced amino acid sequence of 
25 the modified Brazil Nut 2S albumin gene AT2S1BN153W. 

DETAILED DESCRIPTION OF THE INVENTION 
The present invention demonstrates that the 2S albumin firom Brazil Nut is able to 
accommodate much more radical changes than had been demonstrated previously and 
that non-conservative replacements with the intent of enriching the protein with 
30 essential amino acids outside of the region between the 6th and 7th cysteines can be 
tolerated without influencing the ability of the protein to be expressed in the seeds of 
transgenic plants. Such altered Brazil Nut 2S albumins modified such that they are 
composed of more than two essential amino acids can accumulate to sufficiently high 
levels to influence the nutritional value of the seed protein. 
35 The present invention describes nucleic acid firagments that encode a modified 

high sulfiir 2S albumin seed storage protein. This novel protein is analogous to the 
protein isolated from Brazil Nuts that is rich in methionine and cysteine amino acids, 
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but has been altered to include other essential amino acids using site specific 
- replacement techniques. 

The structure of the wild type Brazil Nut 2S albumin protein (Figure 1 ; SEQ ID 
N0:2) is characterized by the presence of eight cysteine residues that form four 
5 disulfide bonds. Twenty percent of the sequence is methionine, distributed throughout 
the sequence both as isolated residues and in denser regions of adjacent occupancy. 
Between the 6th and 7th cysteine residues is a region that has been termed the 
"hypervariable loop," so designated because there are examples of engineered versions 
of the Arabidopsis homolog of the Brazil Nut 2S albumin where substantial amounts or 

1 0 even all of this segment, except for four amino acids immediately adjacent to the amino 
end of the region and 5 amino acids adjacent to the carboxyl end of the hypervariable 
loop have been replaced with other non-related sequences. 

The Brazil Nut protein is also distinct fi:om other 2S albumins because it is rich in 
arginine, glutamine and some glutamic acid residues. The presence of a significant 

1 5 fraction (15%) of residues that are basic, as arginine for example, suggested that these 
positions might be suitable for replacement with lysine, an amino acid that is also basic 
but unlike arginine, is an essential dietary requirement for animals. Furthermore, it was 
also considered possible that more radical alterations might be achievable, since protein 
folding might depend mainly on the correct formation of disulfide bonds rather than the 

20 identity of other residues. Genes have been constructed that encode a protein that has 
all of the native argininyl residues replaced by lysyl residues and then further 
supplemented with additional lysyl residues by alterations at positions not expected to 
tolerate such changes. These genes have been expressed in a microorganism and in a 
transgenic plant to alter the nutritional quality of the seed proteins. The increase in 

25 methionine and lysine in the seed must be determined by a) the level of expression of 
the engineered gene in the transformed plant, which depends in part on the seed specific 
expression signals that are used, b) the percentage of methionine and lysine in the 
coding region of the engineered gene, c) the stability of the expressed protein in the seed 
of the transformed plant, which depends in part on its correct processing, intracellular 

30 targeting, folding into a structure to allow accumulation in the seed, and ability to 
withstand desiccation, and d) the compatibility of the new protein with the natural 
variants of the transformed plant. 

Transfer of the gene constructs of the invention (linked to suitable regulatory 
sequences) into a living cell will result in the production of the encoded protein. 

35 Additionally, transfer of tiie gene constructs of the invention into plants, particularly 
Brassica, or other suitable crop plants such as com, soybean or oil seed rape, with 
suitable regulatory sequences to direct expression of the protein in seeds may result in 
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increased level of sulfur-containing and basic amino acids, particularly methionine and 
- lysine, respectively, thus improving the nutritional quality of seed protein for animals. 
Definitions 

The following terms shall have the meaning set forth herein: The term "essential 
5 amino acids" refers to tiiose amino acids which mvist be obtained by animals and 
humans firom dietary sources. The essential amino acids are arginine (Arg), histidine 
(His), isoleucine (lie), leucine (Leu), lysine (Lys), methionine (Met), phenylalanine 
(Phe), threonine (Thr), tryptophan (Trp) and valine (Val). 

The term "nucleic acid" refers to a polynucleotide of high molecular weight which 

10 can be single-stranded or double-stranded, composed of monomers (nucleotides) 
containing a sugar, phosphate and a base which is either a purine or pyrimidine. A 
"nucleic acid fi:agment" is a fraction of a given nucleic acid molecule. In higher plants, 
deoxyribonucleic acid (DNA) is the genetic material while ribonucleic acid (RNA) is 
involved in the transfer of information contained within DNA into proteins. A 

1 5 "genome" is the entire body of genetic material contained in each cell of an organism. 
The term "nucleotide sequence" refers to a polymer of DNA or RNA which can be 
single- or double-stranded, optionally containing synthetic, non-natural or altered 
nucleotide bases capable of incorporation into DNA or RNA polymers. 

The term "homologous to" refers to the complementarity between the nucleotide 

20 sequence of two nucleic acid molecules or between the amino acid sequences of two 
protein molecviles. Estimates of such homology are provided by either DNA-DNA or 
DNA-RNA hybridization under conditions of stringency as is well understood by those 
skilled in the art (as described in Hames and Higgins (eds.) Nucleic Acid Hybridisation, 
IRL Press, Oxford, U.K.); or by the comparison of sequence similarity between two 

25 nucleic acids or proteins. 

The term "substantially similar" refers to nucleotide and amino acid sequences 
that represent equivalents of the instant inventive sequences. For example, altered 
nucleotide sequences which simply reflect the degeneracy of the genetic code but 
nonetheless encode amino acid sequences that are identical to the inventive amino acid 

30 sequences are substantially similar to the inventive sequences. In addition, amino acid 
sequences that are substantially similar to the instant sequences are those wherein 
overall amino acid identity is 95% or greater to the instant sequences. Modifications to 
the instant invention that result in equivalent nucleotide or amino acid sequences is well 
within the routine skill in the art. Moreover, the skilled artisan recognizes that 

35 equivalent nucleotide sequences encompassed by this invention can also be defined by 
their ability to hybridize, under stringent conditions (O.IX SSC, 0.1% SDS, 65°C), with 
the nucleotide sequences that are within the literal scope of the instant claims. 
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The term "gene" refers to a nucleic acid fragment that expresses a specific protein, 
- including regulatory sequences preceding (5' non-coding) and following (3' non-coding) 
the coding region. "Native" gene refers to the gene as found in nature with its own 
regulatory sequences, "Chimeric" gene refers to a gene comprising heterogeneous 
5 regulatory and coding sequences. "Endogenous" gene refers to the native gene 

normally found in its natural location in the genome. A "foreign" gene refers to a gene 
not normally found in the host organism but that is introduced by gene transfer. 

The term "coding sequence" refers to a DNA sequence that codes for a specific 
protein and excludes the non-coding sequences. It may constitute an "uninterrupted 

1 0 coding sequence", i.e., lacking an intron, such as in a cDNA or it may include one or 
more introns bounded by appropriate splice junctions. An "intron" is a sequence of 
RNA which is contained in the primary transcript but which is removed through 
cleavage and re-ligation of the RNA within the cell to create the mature mRNA that can 
be translated into a protein. 

1 5 The terms "initiation codon" and "termination codon" refer to units of three 

adjacent nucleotides in a coding sequence that specify initiation and chain termination, 
respectively, of protein synthesis (mRNA translation). "Open reading frame" refers to 
the region of a DNA or RNA that is between translation initiation and termination 
codons and is therefore capable of encoding a protein product. 

20 The term "RNA transcript" refers to the product resultmg from RNA polymerase- 

catalyzed ttanscription of a DNA sequence. When the RNA transcript is a perfect 
complementary copy of the DNA sequence, it is referred to as the primary transcript or 
it may be a RNA sequence derived from posttranscriptional processing of the primary 
franscript and is referred to as the mature RNA. "Messenger RNA" (mRNA) refers to 

25 the RNA that is without introns and that can be translated into protein by the cell. 
"cDNA" refers to a single- or a double-sfranded DNA that is complementary to and 
derived from mRNA. 

The term "regulatory sequences" means nucleotide sequences located upstream 
(5'), within, and/or downsfream (3') to a coding sequence, which confrol the 

30 franscription and/or expression of the coding sequences, potentially in conjunction with 
the protein biosynthetic apparatus of the cell. These nucleotide sequences include a 
promoter sequence, a translation leader sequence, a transcription termination sequence, 
and a polyadenylation sequence. 

The term "promoter" refers to a DNA sequence in a gene, usually upstream (5') to 

35 its coding sequence, which controls the expression of the coding sequence by providing 
the recognition for RNA polymerase and other factors required for proper transcription. 
A promoter may also contain DNA sequences that are involved in the binding of protein 
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factors which control the effectiveness of transcription initiation m response to 
- physiological or developmental conditions. It may also contain enhancer elements. 
The term "enhancer" means a DNA sequence which can stimulate promoter 
activity. It may be an innate element of the promoter or a heterologous element inserted 
5 to enhance the level and/or tissue-specificity of a promoter. "Constitutive promoters" 
refers to those promoters that direct gene expression in substantially all tissues and at 
substantially all times. "Organ-specific" or "development-specific" promoters as 
referred to herein are those that direct gene expression ahnost exclusively in specific 
organs, such as leaves or seeds, or at specific development stages in an organ, such as in 
1 0 early or late embryogenesis, respectively. 

The term "expression" means the production of the protein product encoded by a 

gene. 

The term "3' non-coding sequences" refers to the DNA sequence portion of a gene 
that contains a transcription termination signal, polyadenylation signal, and any other 

1 5 regulatory signal capable of affecting mRNA processing or gene expression. The 

polyadenylation signal is usually characterized by affecting the addition of polyadenylic 
acid tracts to the 3' end of the mRNA precursor. 

The term "5' non-coding sequences" refers to the DNA sequence portion of a gene 
that contains a promoter sequence and a translation leader sequence. 

20 The term "translation leader sequence" refers to that DNA sequence portion of a 

gene between the promoter and coding sequence that is transcribed into RNA and is 
present in the fully processed mRNA upstream (5') of the translation start codon. The 
translation leader sequence may affect processing of the primary transcript to mRNA, 
mRNA stability or translation efficiency. 

25 The term "mature" protein refers to a post-translationally processed polypeptide 

without its signal peptide. "Precursor" protein refers to the primary product of 
translation of an mRNA. "Signal peptide" refers to the amino terminal extension of a 
polypeptide, which is translated in conjunction with the polypeptide fo rmin g a precursor 
peptide and which is required for its entrance into the secretory pathway. The term 

30 "signal sequence" refers to a nucleotide sequence that encodes the signal peptide. 

The term "intracellular localization sequence" refers to a nucleotide sequence that 
encodes an intracellular targeting signal. An "intracellular targeting signal" is an amino 
acid sequence which is translated in conjunction with a protein and directs it to a 
particular sub-cellular compartment. "Endoplasmic reticulum (ER) stop transit signal" 

35 refers to a carboxy-terminal extension of a polypeptide, which is translated in 

conjunction with the polypeptide and causes a protein that enters the secretory pathway 
to be retained in the ER. "ER stop transit sequence" refers to a nucleotide sequence that 
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encodes the ER targeting signal. Other intracellular targeting sequences encode 
. targeting signals active in seeds and/or leaves and vacuolar targeting signals. 

The term "transformation" refers to the transfer of a nucleic acid fragment into the 
genome of a host cell, resulting in genetically stable inheritance. Host cells containing 
5 the transformed nucleic acid fragments are referred to as "transgenic" cells, and 
organisms comprising transgenic cells are referred to as "transgenic organisms". 
Examples of methods of transformation of plants and plant cells include 
Agrobacterium-mediatQd transformation (De Blaere et al. (1987) Meth. Enzymol. 143, 
277) and particle bombardment technology (Klein et al. (1987) Nature (London) 327, 

10 70-73; U.S. Patent No. 4,945,050). Whole plants may be regenerated from transgenic 
cells by methods well known to the skilled artisan (see, for example, Fromm et al. 
(1990) Bio/Technology 8, 833). 

The term "cassette" means a nucleic acid fragment prepared by the annealing of 
two synthetic and complementary oligonucleotides. 

1 5 Based upon published sequences (Altenbach et al., (1 987) Plant Molecular 

Biology 8, 239-250; Gander et al., (1991) Plant Molecular Biology 16, 437-448) of the 
Brazil Nut 2S albumin gene, oligonucleotides were synthesized to allow construction of 
modified forms of the wild type gene, either through mismatch site-specific procedures 
or as double-stranded DNA cassettes. The first mutations were introduced into the wild 

20 type 2S gene using oligonucleotides that were complementary to the sequence except in 
those positions coinciding with the places where base changes were desired. The 
construct used to achieve these changes was an M13-based plasmid that allowed 
isolation of single stranded form of the wild type 2S albumin gene (pM13BNwt). The 
use of four oligonucleotides (SEQ ID NOs:3-6) through only one round of mutagenesis 

25 produced a version of the gene with 7 of the 1 5 Arg residues replaced by Lys and also 
coincidentally introduced new unique restriction sites, SacII, StuI, Nhel and Clal. The 
final version also had the Ncol site of the wild type gene removed. This version was 
designated pBNCNSS (Figure 2; SEQ ID N0:7). Although removal of the Ncol site 
meant loss of one Met residue, this was compensated by the N-terminal Met introduced 

30 with the Ndel site at the 5'-end of the gene of the original construct. 

Replacement of Arg codons in the gene with Lys codons was achieved in a 
progressive fashion using the new restriction sites available in pBNCNSS (Figure 2) 
and replacement of segments of the gene with double-stranded cassettes. It has been ovir 
experience construction of genes using large cassettes is not predictably successfiil and 

35 often results in poor efficiency m the number of transformants that carry the desired 
DNA after amplification. This may be due to the bacterium using correcting 
mechanisms when transformed with relaxed duplex DNA. The oligonucleotides 
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synthesized to form the cassettes were therefore routinely extended at the 5' ends so that 
- once the complementary pairs of strands had been aimealed to form the cassette, 
restriction digestion would result in fragments bearing ends of single stranded DNA 
with precise complementarity to the single stranded ends of a vector that had been 
5 treated with the same restriction enzyme; these could then be ligated more efficiently 
into the expression vector of choice. By introducing more restriction sites or removal of 
existing sites, the altered DNA from each series of new constructs could be readily 
assessed for success. All new versions of the gene were sequenced fiilly to confirm that 
the desired mutations had been introduced. As protein expression was ultimately to be 

1 0 assessed with progressive enrichment of Ly s, this version of the gene was introduced 
into pET24a (Novagen, Inc., Madison, WI), a commercial plasmid allowing expression 
from the T7 promoter in suitable hosts. This provided plasmid pETBNCNSS. 

The first construct that was made using cassettes, pETBNl 1, comprising the 
modified Brazil Nut 2S albumin gene BNl 1 (Figure 3; SEQ ID NO:10), is characterized 

15 by the introduction twelve of Ly s codons (compared to zero Ly s codons in the wild type 
2S gene). In addition, the threonine (Thr) codon corresponding to position 33 of the 
wild type protein was replaced with serine (Ser) codon to introduce an SphI site 
adjacent to the SacII site. 

Complete replacements for all Arg codons was achieved using two sets of double 

20 stranded cassettes (SEQ ID NOs:l 1-14) to replace the SphI to Hindlll segment of 

BNl 1 . In this case, an internal Styl was created for convenient ligation of the two sets 
of cassettes after annealing, and the opportunity was taken to remove 125 bases from 
the 3'- non-coding region of BNl 1. The resulting construct, pETBN15, was the version 
of the enriched gene (BN15; Figure 4; SEQ ID N0:15) with all fifteen Arg codons 

25 replaced by Lys codons and from which all other variants were made. 

Variants of BNl 5 that explored introduction of other essential amino acids 
through non-conservative changes included BN17 (Figure 5; SEQ ID N0:18) wherein 
Ser 107 changed to Lys and glycine (Gly) 105 to Lys by oUgonucleotide replacement 
(SEQ ID N0s:16 and 17). Likewise, BN18 was a version of BN17 wherein Ser 44 

30 changed to Lys (Figure 6; SEQ ID NO: 2 1) by oligonucleotide replacement (SEQ ID 
NOs:19and20). 

A version of BNl 5 was fiirther enhanced with Lys and Met residues by replacing 
amino acids that are not considered of this type, i.e., non-conservative changes. Thus 
replacement of the Ndel-SphI fragment of BNl 5 with two oligonucleotides (SEQ ID 
35 NOs:22 and 23) introduced two fiirther Lys residues in place of two glutamic acid (Glu) 
residues at positions 4 and 28, and introduced an extra Met replacing Glu 27. This 
produced the modified Brazil Nut 2S albumin gene BNl 9 (Figure 7; SEQ ID NO:24). 
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Other variants such as BN153KW (Figure 8; SEQ ID NO:29), explored 
. introduction of the essential amino acid Trp into the sequence of the Brazil Nut 2S 
albumin. Figure 9 is a composite of the amino acid sequences encoded by each of the 
wild type and modified Brazil Nut 2S albumin genes described above. 
5 In order to create the gene that encodes the precursor form of the 2S storage 

protein, nucleic acid fi:agments were constructed that recreated the precursor sequence 
of an Arabidopsis 2S protein. An oligonucleotide cassette (comprising SEQ ID NOs:30 
and 31) encoding the Arabidopsis precursor sequence was designed so that introduction 
of the cassette into the 5'-end of the wild type and modified 2S albumin genes described 

1 0 above resulted in an in-frame fiision of the precursor sequence with the sequence of the 
mature genes. The resulting genes had an Ncol site at the ATG initiation codon of 
precursor sequence and an Ndel site at the first codon of the mature sequence. These 
precursor genes were designated AT2SlBNwt (not shown), AT2S1BN15 (Figure 10; 
SEQ ID NO:32), and AT2S1BN19 (Figure 1 1; SEQ ID NO:33) to indicate they 

15 contained the wild type and BN15, and BN19 variants of the 2S albumin gene, 

respectively. Another precursor gene, AT2S1BN153W (Figure 12; SEQ ID NO:36), 
was prepared by direct replacement of the Nhel to Hindlll fragment of pAT2SlBN15 
with a cassette which directed the incorporation of an increased nimiber of tryptophanyl 
residues in the gene product. 

20 The nucleic acid fragment coding for the sulfiir rich seed 2S protein may be 

attached to suitable regulatory sequences and used to overproduce the protein in 
microbes such as bacteria or yeast, or in transgenic plants such as Brassica, cereals or 
legumes. Such a DNA construction may include either the wild type 28 gene or an 
engineered gene. One skilled in the art can isolate the coding sequences from the 

25 fragment of the invention by using and/or creating restriction endonuclease sites. 
Expression of enriched 2S protein in E. coli 

To express the modified Brazil Nut 2S coding sequences in E. coli, the 
commercial expression vector pET24a was used. This vector employs the 
bacteriophage T7 RNA polymerase/T7 promoter system (Studier et al., (1990) Methods 

30 in Enzymology 185, 60-89) for gene transcription. The variants of all the 2S albumin 
genes mcluding the wild type construct were ligated into pET24a using the 
Ndel-Hindlll sites. These constructs were used to fransform competent E. coli cells 
(strain BL21) which were grown to mid-log phase in LB before induction with IPTG. 
The protem expressed in transformed E. coli hosts, was assessed by electrophoresis of 

35 lysed cell contents on SDS polyacrylamide gels and comparison to authentic Brazil Nut 
2S albumin isolated from the native source. Verification that the expressed protein 
bands on gels included the recombinant 2S proteins was achieved by electroblotting the 
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proteins to PVDF membranes, obtaining the N-terminal sequences, and confirming that 
- the experimental sequences matched the predicted sequences. 
Expression of enriched 2S protein in plants 

The nucleic acid fragments of the invention can be used to produce large 
5 quantities of Ihe 2S protein enriched in essential amino acids especially methionine and 
lysine via fermentation in E. coli or other microorganisms. To do this the nucleic acid 
fragment of the invention can be operably linked to a suitable regulatory sequence 
comprising a promoter sequence, a translation leader sequence and a 3' non-coding 
sequence. The chimeric gene can then be introduced into a microorganism via 

1 0 transformation and the transformed organism grown under conditions resulting in high 
expression of the engineered gene. The cells containing the protein rich in essential 
amino acids can be collected and the enriched protein extracted. Because high level 
production is not toxic to the cells, higher levels could be achieved using other strains. 
A preferred class of hosts for the expression of the coding sequence of modified 

1 5 Brazil Nut 2S albumin proteins are eukaryotic hosts, particularly the cells of higher 
plants. Particularly preferred among the higher plants and the seeds derived from them 
are soybean, rapeseed {Brassica mpus, B. campestris), sunflower (Helianthus annus), 
cotton (Gossypium hirsutum), com, tobacco (Nicotiana tabacum), alfalfa (Medicago 
sativa), wheat (Triticum sp.), barley (Hordeum vulgare), oats (Avena sativa, L), 

20 sorghum {Sorghum bicolor), rice {Oryza sativa), and forage grasses. Expression in 
plants will use regulatory sequences ftmctional in such plants. 

The expression of foreign genes in plants is well-established (De Blaere et 
al.(1987) Meth. Enzymol. 753:277-291). The origin of promoter chosen to drive the 
expression of the coding sequence is not critical as long as it has sufficient 

25 transcriptional activity to accomplish the invention by increasing the level of 

franslatable mRNA for modified Brazil Nut 2S albumin proteins in the desired host 
tissue. Preferred promoters for expression in all plant organs, and especially for 
expression in leaves include those directing the 19S and 35S transcripts in Cauliflower 
mosaic virus (Odell et al. (1985) Nature 313, 810-812; Hull et al. (1987) Virology 86, 

30 482-493), small subunit of ribulose 1 ,5-bisphosphate carboxylase (Morelli et al.(l 985) 
Nature 315, 200; Broglie et al. (1984) Science 224, 838; Hererra-Esfrella et al.(1984) 
Nature 310, 115; Coruzzi et al.(1984) EMBOJ. 3, 1671; Faciotti et al.(1985) 
Bio/Technology 3, 241), maize zein protein (Matzke et al.(1984) EMBOJ. 3, 1525), and 
chlorophyll a/b binding protein (Lampa et al.(1986) Nature 316, 750-752). 

35 Depending upon tiie application, it may be desirable to select promoters that are 

specific for expression in one or more organs of the plant. Examples include the light- 
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inducible promoters of the small subimit of ribulose 1,5-bisphosphate carboxylase, if the 
- expression is desired in photosynthetic organs, or promoters active specifically in seeds. 
Preferred promoters are those that allow expression of the proteui specifically in 
seeds. This may be especially useful, since seeds are the primary source of vegetable 
5 protein and also since seed-specific expression will avoid any potential deleterious 
effect in non-seed organs. Examples of seed-specific promoters include, but are not 
limited to, the promoters of seed storage proteins, which represent more than 50% of 
total seed protein in many plants. The seed storage proteins are strictly regulated, being 
expressed almost exclusively in seeds in a highly organ-specific and stage-specific 
10 manner (Higgins et al.(1984) Ann. Rev. Plant Physiol. 35, 191-221; Goldberg 
et al.(1989) Cell 56, 149-160; Thompson et al. (1989) BioEssays 10, 108-1 13). 
Moreover, different seed storage proteins may be expressed at different stages of seed 
development. 

There are currently numerous examples for seed-specific expression of seed 

1 5 storage protein genes in transgenic dicotyledonous plants. These include genes firom 
dicotyledonous plants for bean P-phaseolin (Sengupta-Gopalan et al. (1985) Proc. Natl. 
Acad. Sci. USA 82, 3320-3324; Hofi&nan et al. (1988) Plant Mol. Biol. 11, 717-729), 
bean lectin (Voelker et al. (1987) EMBO J. 6, 3571-3577), soybean lectin (Okamuro 
et al. (1986) Proc. Natl. Acad. Sci. USA 83, 8240-8244), soybean kunitz trypsin 

20 inhibitor (Perez-Grau et al. (1989) Plant Cell 7:095-1 109), soybean p-conglycinin 

(Beachy et al. (1 985) EMBO J. 4, 3047-3053 ; Barker et al. (1 988) Proc. Natl. Acad Sci. 
USA 85, 458-462; Chen et al. (1988) EMBO J. 7, 297-302; Chen et al. (1989) Dev. 
Genet. 10, 112-122; Naito et al. (19SS) Plant Mol. Biol. 11, 109-123), pea vicilin 
(Higgins et al. (1988) Plant Mol. Biol. 11, 683-695), pea convicilin (Newbigin et al. 

25 (1990) Planta 180461), pea legumin (Shirsat et al. (1989) Mol. Gen. Genetics 215, 
326); rapeseed napin (Radke et al. (1988) Theor. Appl. Genet. 75, 685-694) as well as 
genes firom monocotyledonous plants such as for maize 15 kD zein (Hoffinan et al. 
(1987) EMBO J. 6, 3213-3221; Schemthaner et al. (1988) EMBO J. 7, 1249-1253; 
Williamson et al. (1988) Plant Physiol. 88, 1002-1007), barley p-hordein (Marris et al. 

30 (1988) Plant Mol. Biol. 10, 359-366) and wheat glutenin (Colot et al. (1987) EMBO J. 
6, 3559-3564). Moreover, promoters of seed-specific genes operably linked to 
heterologous coding sequences in chimeric gene constructs also maintain their temporal 
and spatial expression pattern in transgenic plants. Such examples include Arabidopsis 
thaliana 2S seed storage protein gene promoter to express enkephalin peptides in 

35 Arabidopsis and B. napus seeds (Vandekerckhove et al. (1989) Bio/Technology 7, 

929-932), bean lectin and bean P-phaseolin promoters to express luciferase (Riggs et al. 
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(1989) Plant Sci. 63, 47-57), and wheat glutenin promoters to express chloramphenicol 
- acetyltransferase (Colot et al. (1987) EMBOJ. 6, 3559-3564). 

Of particular use in the expression of the nucleic acid fragment of the invention 
will be the heterologous promoters from several extensively-characterized soybean seed 
5 storage protein genes such as those for the Kunitz trypsin inhibitor (Jofuku et al. (1989) 
Plant Cell 1, 1079-1093; Perez-Grau et al. (1989) Plant Cell 1, 1095-1109), glycinin 
(Nielson et al. (1989) Plant Cell 1, 313-328), p-conglycinin (Harada et al. (1989) Plant 
Cell 1, 415-425). Promoters of genes for a'- and p-subunits of soybean p-conglycinin 
storage protein will be particularly usefiil in expressing the modified Brazil Nut 2S 

1 0 albumin 2S albumin mRNA in the cotyledons at mid- to late-stages of soybean seed 
development (Beachy et al. (1985) EMBOJ. 4, 3047-3053; Barker et al. (1988) Proc. 
Natl. Acad. Sci. USA 85, 458-462; Chen et al. (1988) EMBO J. 7, 297-302; Chen et al. 
(1989) Dev. Genet. 10, 1 12-122; Naito et al. (1988) Plant Mol. Biol. 11, 109-123) in 
transgenic plants, since: a) there is very little position effect on their expression in 

15 transgenic seeds, and b) the two promoters show different temporal regulation: the 
promoter for the a'-subunit gene is expressed a few days before that for the p-subunit 
gene. 

Also of particular use in the expression of the nucleic acid fragments of the 
invention will be the heterologous promoters from several extensively characterized 

20 com seed storage protein genes such as those from the 10 kD zein (Kirihara et al. (1988) 
Gene 71, 359-370), the 27 kD zein (Prat et al. (1987) Gene 52, 51-49; Gallardo et al. 
(1988) Plant Sci. 54, 21 1-281), and the 19 kD zein (Marks et al. (1985) J. Biol. Chem. 
260, 16451-16459). The relative transcriptional activities of these promoters in com 
have been reported (Kodrzyck et al. (1989) Plant Cell 1, 105-1 14) providing a basis for 

25 choosing a promoter for use in chimeric gene constmcts for com or other monocots. 

Proper level of expression of 2S engineered genes enriched in essential amino 
acids may require the use of different promoters. Such chimeras can be transferred into 
host plants either together in a single expression vector or sequentially using more than 
one vector or more than one copy of the enriched gene transcribed from the same 

30 vector. 

It is envisioned that the introduction of enhancers or enhancer-like elements into 
promoter constracts will also provide increased levels of primary transcription for 
modified Brazil Nut 2S albumin proteins to accomplish the invention. This would 
include viral enhancers such as that found in the 35S promoter (Odell et al. (1988) Plant 
35 Mol. Biol. 10, 263-272), enhancers from the opine genes (Fromm et al. (1989) Plant 
Cell 1, 977-984), or enhancers from any other source that result in increased 
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transcription when placed into a promoter operably linked to the nucleic acid fragment 
. of the invention. 

Of particular importance is the DNA sequence element isolated from the gene for 
the a'-subunit of P-conglycinin that can confer 40-fold seed-specific enhancement to a 
5 constitutive promoter (Chen et al. (1988) EMBO J. 7, 297-302; Chen et al. (1989) Dev. 
Genet. 10, 1 12-122). One skilled in the art can readily isolate this element and insert it 
within the promoter region of any gene in order to obtain seed-specific enhanced 
expression with the promoter in transgenic plants. Insertion of such an element in any 
seed-specific gene lhat is expressed at different times than the P-conglycinin gene will 
1 0 result in expression in transgenic plants for a longer period during seed development. 

The invention can also be accomplished by a variety of other methods to obtain 
the desired end. In one form the invention is based on modifying plants to produce 
increased levels of 2S enriched protein by having significantly larger numbers of copies 
of the modified gene either through enhanced promotion or multiple copies on each 
15 message. 

Any 3' non-coding region capable of providing a franscription termination signal, 
a polyadenylation signal and other regulatory sequences that may be required for the 
proper expression of the modified Brazil Nut 2S albumin protein coding region can be 
used to accomplish the invention. This would include the 3' end from a heterologous 

20 zein gene, the 3' end from any storage protein such as the 3' end of the soybean 

p-conglycinin gene, the 3' end from viral genes such as the 3' end of the 35S or the 19S 
cauliflower mosaic virus transcripts, the 3' end from the opine synthesis genes, the 3' 
ends of ribulose 1,5-bisphosphate carboxylase or chlorophyll a/b binding protein, or 3' 
end sequences from any source such that the sequence employed provides the necessary 

25 regulatory information withm its nucleic acid sequence to result in the proper expression 
of the promoter/modified Brazil Nut 2S albumin protein coding region combmation to 
which it is operably linked. There are numerous examples in the art that teach the 
usefulness of different 3' non-coding regions (for example, see Ingelbrecht et al. (1989) 
Plant Cell 1, 671-680). 

30 DNA sequences coding for intracellular localization sequences may be added to 

the modified Brazil Nut 2S albumin protein coding sequence if required for the proper 
expression of the proteins to accomplish the invention. Thus the signal sequence from 
the p subunit of phaseolin from the bean Phaseolus vulgaris, or the signal sequence 
from the a' subunit of P-conglycinin from soybean (Doyle et al. (1986) J. Biol. Chem. 

35 261, 9228-9238), can be employed. Hoffman et al. ((1987) EMBO J. 6, 3213-3221) 
showed that the signal sequence of the monocot precursor of a 15 kD zein directed the 
protein into the secretory pathway and was also correctly processed in transgenic 
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tobacco seeds. However, the protein did not remain within the endoplasmic reticulum 
_ as is the case in com. To retain the protein in the endoplasmic reticulum it may be 
necessary to add stop transit sequences. It is known in the art that the addition of DNA 
sequences coding for the amino acid sequence (Lys-Asp-Glu-Leu) at the carboxyl 
5 terminal of the protein retains proteins in the lumen of the endoplasmic reticulum 
(Munro et al. (1987) Cell 48, 899-907; Pelham (1988) EMBOJ. 7, 913-918; Pelham 
et al. (1988) EMBOJ. 7, 1757-1762; Inohara et al. (1989) Proc. Natl Acad Sci. U.S.A. 
86, 3564-3568; Hesse et al. (1989) EMBOJ. 8, 2453-2461). In some plants seed 
storage proteins are located in the vacuoles of the cell. In order to accomplish the 

1 0 invention it may be necessary to direct the modified Brazil Nut 2S albumin protein to 
the vacuole of these plants by adding a vacuolar targeting sequence. A short amino acid 
domain that serves as a vacuolar targeting sequence has been identified fi-om bean 
phytohemagglutinin which accumulates in protein storage vacuoles of cotyledons 
(Tague et al. (1990) Plant Cell 2, 533-546). In another report a carboxyl-terminal 

1 5 amino acid sequence necessary for directing barley lectin to vacuoles in transgenic 
tobacco was described (Bednarek et al. (1990) Plant Cell 2, 1 145-1 155). 
Construction of chimeric genes for expression of Brazil Nut 2S in plants 

Three specific gene expression cassettes were used for construction of chimeric 
genes for expression of 2S in plants to explore expression of altered forms of the gene in 

20 a plant host. Specifically those variants of the 2S gene with conservative replacements 
as exemplified by Arg to Lys and also an example of non-conservative changes as in 
BN19. The expression cassettes contained the regulatory regions fi"om two highly 
expressed seed storage protein genes: 

1) the promoter of the highly expressed storage protein, P-conglycmin of 
25 soy bean; and 

2) the 3'-termination sequence of phaseolin from Phaseolus vulgaris. 

The precursor sequence of one of the 2S albumin genes from Arabidopsis thaliana 
was introduced in-frame at the 5'-end of the 2S native gene and selected variants to give 
AT2SlBNwt, AT2S1BN15, AT2S1BN19 andAT2SlBN153W. The precursor versions 

30 of these genes were then ligated between the p-conglycinin promoter and the 3'- 

phaseolin termination region (Slightom etal, (\99\) Plant Mol. Biol. Man. B16, 1-55) 
in plasmid pCW109. The vector pCW109 was made by insertmg into the Hindlll site 
of the cloning vector pUC18 a 555 bp 5' non-coding region (containing the promoter 
region) of the P-conglycinin gene followed by the multiple cloning sequence containing 

35 the restriction endonuclease sites for Nco I, Sma I, Kpn I and Xba I, then 1 174 bp of the 
common bean phaseolin 3' untranslated region into the Hindlll site (described above). 
This plasmid allows the precursor, mature and flanking regulatory regions to be isolated 
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as one large Hindlll fragment after amplification and isolation from E. coli (Odell et al., 
- (1994) Plant Physiol 106, 447-458;. Introduction of the Hindlll fragment into the 
same site of pZS96 (Odell et al., (1994) Plant Physiol. 106, 447-458) positions the 
segment conveniently between the left and right border DNA sequences of the 
5 Ti plasmid of Agrobacterium tumifaciens effective in infecting plant hosts. 

Various methods of transforming cells of higher plants according to the present 
invention are available to those skilled in the art. A method that found particular use in 
this case was the infection of Arabidopsis plants by vacuum infiltration as described by 
Bechtold et al. ((1993) C.R. Acad. Sci. Paris 316, 1 194-1 199). Seeds of T3 and T4 
1 0 generations of Arabidopsis plants harboring the Brazil Nut 2S wild type albumin gene 
and enhanced variants were studied for altered amino acid enrichment of Lys and Met 
residues to show that significant increases in the amounts of these two essential amino 
acids are detectable. 

EXAMPLES 

1 5 The present invention is fiirther defined in the following Examples, in which all 

parts and percentages are by weight and degrees are Celsius, unless otherwise stated. It 
should be understood that these Examples, while indicating preferred embodiments of 
the invention, are given by way of illustration only. From the above discussion and 
these Examples, one skilled in the art can ascertain the essential characteristics of this 

20 invention, and without departing from the spirit and scope thereof, can make various 
changes and modifications of the invention to adapt it to various usages and conditions. 

Standard recombinant DNA and molecular cloning techniques used herein are 
well known in the art and are described more fiiUy in Sambrook, J., Fritsch, E.F. and 
Maniatis, T. Molecular Cloning: A Laboratory Manual; Cold Spring Harbor 

25 Laboratory Press: Cold Spring Harbor, 1989. 

EXAMPLE 1 
Molecular Cloning of the Brazil Nut 2S gene 
Genes and cDNAs encoding the Brazil nut 2S albumin have been extensively 
described (Altenbach et. al., (1987) Plant Mol. Biol. 8, 239-250; Gander et. al., (1991) 

30 Plant Mol. Biol. 16, 437-448) and have even been constructed de novo on the basis of 
published sequences (Saalbach et. al. (1994) Mol. Gen. Genet 242, 226-236). The 
333bp encoding the mature sequence is shown in Fig. 1, SEQ ID N0:1. The starting 
Brazil nut 2S albumin sequence (SEQ ID No:l) used herein is supplemented with an 
initiation codon to facilitate expression in prokaryotic cells. An Ndel-EcoRI fragment 

35 encompassing the nature sequence was ligated into a derivative of pET3a (Novagen 
Inc., Madison, WI) that has the Ncol site replaced by a short multiple cloning site 
containing Nde I and EcoRI. This plasmid, pET3am was digested with Ndel and EcoRI 
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to accept tihie BNwt gene fragment giving the plasmid pBNwt; see Figure 3. This 
- plasmid, which also carries the gene for p-lactamase, was used to transform competent 
KcoH (JM 83) cells that were grown in SOC medium (Hanahan, D. (1983) J. Mol. Biol 
166, 557) before selecting for plasmid-bearing organisms with ampicillin. The cells 
5 were streaked onto agarose-LB plates that contained ampicillin (50 ug/mL) and grown 
overnight at 37°. A single colony was picked and inoculated into 50 mL of LB medium 
also containing ampicillin. The culture was shaken at 37° until the cells had reached an 
ODgoo of about 3.0. The cells were harvested by centrifugation and the DNA isolated 
and purified using the procedures described by the suppliers of the Promega Wizard™ 
10 kit. The purified DNA was verified by restriction site digestion and electrophoretic 
separation of the fragments on 1% agarose gels. The 2S gene in the plasmid was also 
sequenced in both directions using primers that armealed to the vector sequence close to 
the T7 promoter outside the 5' end of the coding region and the 3' end at the T7 
terminator. 

15 EXAMPLE 2 

Modification of the Brazil Nut 2S Albumin Gene bv 
Mutagenesis of Single-Stranded DNA 
The 2S gene was excised from pBNwt using unique EcoRI and Xbal sites. The 
firagment was ligated into M13mpl8 that had also been cut with EcoRI and Xbal. The 

20 resulting plasmid, pM13BNwt, thus allowed either double- or single-stranded forms of 
the gene to be isolated. The first series of mutations that resulted in conversion of Arg 
codons to Lys codons were achieved using the single stranded form of pM13BNwt. 
The Muta-Gene™ kit (Biorad Laboratories, Richmond, CA) provides a means of 
strongly selecting against the non-mutagenised strand of double-stranded DNA. This 

25 was achieved by transforming E. coli CJ236 competent cells with pM13BNwt, a host 
that has a double mutation in the dut and u«g genes. Some of the thymines in the DNA 
are stabily replaced by uracil m this double mutant. The transformants were grown on 
LB plates containing chloramphenicol. One of the colonies was grown overnight in 
chloramphenicol-containing medium and the single-stranded DNA contaming uracil 

30 was then isolated as a phagemid as described by the suppliers of the kit. 

The four oUgonucleotides used to introduce mutations into the 2S gene (SEQ ID 
NOs:3-6) were first phosphorylated with T4 polynucleotide kinase at 37° for 60 min and 
the reaction stopped by heating at 65° for 10 min. The oligonucleotides were 
simultaneously annealed to the uracil enriched single-stranded DNA of pM13BNwt at 

35 70° followed by slow cooling to room temperature over 40 min. The resulting partial 
duplex was tiien stored on ice before the double-stranded DNA was generated using T4 
polymerase to extend the oUgonucleotides in the presence of all four dNTPs and T4 
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DNA ligase to ligate the ends of the extended DNA. The resulting double-stranded 
- DNA was used to transform E. coli MV1190 cells that have active uracil-N-glycosylase 
that inactivates the uracil-containing strand so that only the mutant strand replicates. 
The transformed cells generated &e double stranded form of the M13 derivative; this 
5 was assessed by restriction analysis to ensure that the four imique restriction sites SacII, 
StuI, Nhe I and Cla I engineered into the oligonucleotides had been incorporated as a 
result of Ihe manipulations, and that the internal Ncol site of the wild type gene was 
eliminated. The modified gene was excised from the Ml 3 construct using EcoRI and 
Xbal and ligated back into pETBam. The resulting plasmid, pBNCNSS, contains a 
1 0 Brazil Nut 2S albumin gene with seven Arg codons replaced by Lys codon at positions 
corresponding to amino acids 37, 58, 63, 82, 83, 86, and 101 of the wild type protein, 
accompanied by a Met to phenylalanine (Phe) at position 104 (Figure 2; SEQ ID N0:7). 
EXAMPLE 3 

Cassette Mutagenesis of the Brazil Nut 2S Albumin Gene 

1 5 The modified 2S gene firom pBNCSS was isolated by restriction enzyme digestion 

with Ndel and Hindlll, and Ugated into pET24a to give pETBNCNSS. The series of 
mutations that replaced a further four of the Arg residues with Lys were localized in the 
N-terminal half of the gene and were accompUshed by aimealing synthetic 
complementary oligonucleotides that coded for the altered sequence fi-om the Ndel site 

20 to SacII site (SEQ ID N0s:8 and 9). The mdividual 131 base oligonucleotides were 

first purified by electrophoreisis on 8% polyacrylamide gel. The band was excised firom 
the gel, eluted, and washed prior to annealing at 90° for 3 min. The annealing solution 
was then cooled slowly to 30° and placed on ice for 3 min. The oligonucleotides were 
designed with extended ends beyond the Nde I and Sac II sites so that, following 

25 annealing, the double-stranded cassette could be digested with these two enzymes to 
produce a high percentage of clean 'restriction' ends. The resulting efficiencies and 
consistency of ligation into the Nde I/Sac Il-digested pETBNCSS vector with this 
cassette approach was evident firom the number of transformants carrying the synthetic 
oligonucleotide insert. The isolated vector was validated with respect to tiie correct 

30 insertion of the cassette by restriction analysis to show the presence of the new Sph I 
site introduced with the insert and by sequencing the region of the gene coding for the 
N-termmal segment of the protein. The resulting construct, pETBNl 1, comprising the 
BNl 1 gene, contained twelve Lys codons (Figure 3; SEQ ID NO:10). 

The complete replacement of all codons for Arg residues with ones for Lys in the 

35 gene was accomplished in a similar fashion but using an oligonucleotide cassette 

designed to replace the portion of the gene encoding the C-terminal half of the protein. 
This region of the gene is readily replaced using the convenient Sph I and Hind III sites 
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in the middle and 3'-end of the gene, respectively. The opportunity was also taken to 
- remove much of the non-coding 3'-end of the original constructs. The number of bases 
this region covers is too long to be substituted by a cassette formed from only two 
oligonucleotides. Accordmgly, four oligonucleotides were designed that would 
5 ultimately be ligated together before insertion into the appropriately cut vector. The 
first half of the cassette using the two oligonucleotides displayed as SEQ ID NOs: 1 1 
and 14 had an SphI site located 8 bases from the 5'-end and a Styl site located 8 bases 
from the 3'-end. The second half of the cassette (SEQ ID NOs: 12 and 13) was arranged 
with a Styl site 8 bases from the 5'-end and Hindlll site located some 5 bases from the 3' 

10 end. Once each pair of the two halves had been independently annealed, they were 

digested with Styl, and the Styl ends ligated. The resulting ligated double cassette was 
isolated from an 8% polyacrylamide gel, washed and digested with SphI and Hindlll 
before ligating into pETBNl 1 that had previously digested with the SphI and Hindlll 
and isolated from a 1% agarose gel. 

1 5 Transformants of competent E. coli carrying the altered vector were isolated and 

liie DNA purified. The DNA was validated by restriction analysis before sequencing 
the appropriate region of the gene. In this case, the introduction of a Styl site and 
removal of SacII was diagnostic of successfiil construction. The derivative of 
pETBNl 1, now with all fifteen Arg replaced by Lys, was designated pETBNlS and 

20 encoded the modified BNl 5 gene (Figure 4, SEQ ID NO: 1 5). 

EXAMPLE 4 

Further Enhanced Mutations of the Brazil Nut 2S Albumin Gene 
Other mutations were introduced into the gene to explore whether other residues 
with non-basic sidechains might be sites that are suitable for replacing with essential 

25 amino acids. The plasmid used to do these changes was pETBNl 5. Two residues 

chosen for replacement with Lys were Gly 105 and Ser 107. The relevant region of the 
gene has Nhel and BamHI restriction sites that are convenient for the purpose of 
intrducing substitutions. The 85 bp Nhel-BamHI segment was replaced in pETBNl 5 
with a cassette synthesized with the Nhel and BamHI sites indented by 9 bases (SEQ ID 

30 NOs: 1 6 and 1 7). The cassette was first digested with the two restriction enzymes to 
provide clean Nhe I and BamHI ends and the fragment purified by gel electrophoresis. 
The purified fragment was ligated into pETBNl 5 that had been cut with the same 
enzymes. The resulting plasmid was termed pETBN17 and encoded the modified 
Brazil Nut 2S albumin gene designated BN17 (Figure 5; SEQ ID N0:18). 

35 Using similar procedures, the serine (Ser) residue at position 44 was replaced by 

Lys in pETBNl 7 using a cassette formed by annealing the oligonucleotides shown in 
SEQ ID N0s:19 and 20 and replacement of the Sphl-StuI segment of the gene BN17. 



wo 98/45458 



24 



PCT/US98/06673 



The resulting construct designated pETBNlS and encoded the modified Brasdl Nut 2S 
- albumin gene BN18 (Figure 6; SEQ ID N0:21). 

Replacement of the Ndel-SphI fi-agment of pETBNlS with a cassette formed by 
annealing the oligonucleotides depicted in SEQ ID NOs:22 and 23 resulted in 
5 alterations of Glu 4 and 28 to Lys and Glu 27 to Met. This construct was designated 
pETBN19 and encoded the BN19 mutant (Figure 7; SEQ ID NO:24). Replacement of 
Ndel-StuI fragment of pETBNlS with two ligated cassettes formed by aimealing the 
oligonucleotides depicted in SEQ ID NOs:25 and 26 and oligonucleotides depicted in 
SEQ ID Nos:27 and 28 introduced three further Lys at positions 4, 16 and 41, replacing 
10 Glu, Ser and Pro, respectively; and one Trp at position 42 replacing His, to give 

pETBN153KW, encoding the modified gene BN153KW (Figure 8; SEQ ID NO:29). 
The base changes that resulted in Ser to Lys also introduced an Aflll site into the gene, 
and the opportunity to use a silent mutation changing base 150 from G to C introduced 
an Xhol site. 

15 EXAMPLE 5 

Construction of a Precursor Form of the Brazil Nut 2S Albumin Genes 
The 5'-end of the Brazil Nut 2S albumin gene was extended in-frame to include 
the 37 amino acid precursor sequence of the Arabidopsis 2S albumin protein that should 
contain all the information for the correct processing and targeting of the protein in the 

20 plant. Two 137 base oligonucleotides were synthesized (SEQ ID NOs:30 and 31) with 
recessed Ncol and Ndel sites. The fragments were purified and isolated as described 
above before aimealing together to form the double stranded cassette. The cassette was 
also purified and isolated from an 8% polyacrylamide gel before restriction digestion 
with Ncol and Nde I to produce the clean 5'- and 3'-ends. 

25 The pETBNl 5 vector was cut with Ndel and Hmd III, and the fiagment 

contaming the BN15 gene was purified from the remaining vector using polyacrylamide 
gels and subsequent elution. The Ndel sites of the cassette and the BN15 gene were 
then ligated together to produce the extended gene sequence with Ncol and Hindlll sites 
at the 5'- and 3'- ends, respectively. The extended gene was then Ugated into pET24d 

30 (Novagen Inc., Madison, WI) previously linearized by Ncol and Hindlll digestion. The 
resulting vector was designated pAT2SlBN15 and contains the AT2S1BN15 gene 
(Figure 10; SEQ ID NO:32). 

This construct was then used as the vector into which all versions of the Brazil 
Nut 2S albumin gene could be inserted to give the 5'- extended gene. For example, 

3 5 construction of the wild type version of the precursor-containing gene was 

accomplished by replacing the Ndel-Hindlll fragment of pAT2SlBN15 with the 
Ndel-Hindlll fragment of pETBNwt that has the wild type sequence of the gene. 
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Likewise the replacement of the Ndel-Hindlll jfragment with that of pETBN19 
- generated pAT2SlBN19, containing the modified AT2S1BN19 gene (Figure 11; SEQ 
IDNO:33). 

Smaller segments of the vector could also be manipulated to enhance the amino 
5 acid content of the protein products. For example, the Nhel to Hindlll firagment of 
pAT2SlBN15 was replaced with a cassette composed of the oligonucleotides depicted 
in SEQ ID Nos: 22 and 23. This replaced Glu 89, Ser 93 and Phe 104 with Tip giving 
pAT2SlBN153W (Figure 12; SEQ ID NO:36). 

EXAMPLE 6 

10 Construction of the Plant-Specific Expression Cassette 

The wild type Brazil Nut 2S albumin gene and the modified genes BN15 and 
BN19 were positioned between the promoter that is normally responsible for controlling 
conglycinin expression and the 3' region normally found downstream of the phaseolin 
gene. The vector that contained these control elements (pCW109) was cut with Ncol 

1 5 and Smal. The plasmids containing the wild type and modified Brazil Nut 2S albumin 
genes (pAT2SlBNwt, pAT2SlBN15 and pAT2SlBN19) were first cut with EcoRI and 
blunt-ended with mung bean nuclease. The DNA was precipitated firom a solution 
containing 0.01% SDS and 0.1 M NaCl using two volumes of cold, dry ethanol. The 
gene encoding the 2S precursor-containing protein was excised firom the resulting 

20 linearised DNA using Ncol. The Ncol (5'-) blunt-ended (3') fragments from 

pAT2SlBN15 and pAT2SlBN19 were then ligated into pCW109 that was previously 
linearised by digestion with Ncol-Sma I. The resulting constructs were designated 
pCW109BN15 and pCW109BN19. The equivalent version that included the BNwt 
gene was a little more involved since this gene still has an internal Ncol site. In this 

25 case a partial digest of pAT2S 1 BNwt with Ncol after nuclease treatment of the EcoRI 
site readily provided the fragment of interest that encompassed the complete gene for 
the precursor protein that could also be ligated into Ncol-Sma I-digested pCW109. The 
resulting vector was designated pCW109BNwt. 

EXAMPLE? 

30 Construction of the Binarv Vector Usefiil for Plant Transformation 

The construction of a vector suitable for plant transformation requires the 
presence of the right and left border sequences that Agrobacteriimi utilizes to introduce 
foreign DNA into the nuclear genome of plants, and also encompasses a selection 
cassette that allows for antibiotic selection of those plants that show successfiil 

35 integration of the foreign DNA into their genome. Ideally a second selection cassette 
should also be available for selecting those bacteria transformed with the binary vector 
for manipulation or amplification. 
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The vector chosen to achieve all these desired features was pzs96. pzs96 has 
- genes encoding the N-phosphotransferase that imparts kanamycin resistance on 
transformed plants and the p-lactamase that imparts ampicillin resistance for selection 
in bacteria. Each of the three pCW109AT2SlBN plasmids were digested with Hindlll. 
5 The liberated Hindlll fragments were purified and then mixed separately with an 

equimolar amount of Hindlll-linearized pzs96 before ligation. After ligation, the DNA 
was used to transform competent E. coli; transformants were selected on media 
containing ampicillin. Correct construction was assessed by restriction digest analysis 
and DNA sequencing. In this way only those binary vectors with the Brazil Nut 2S 

1 0 albumin genes in the correct orientation were retained for use in transforming plants. 
EXAMPLE 8 
Transformation of Arabidovsis thaliana 
The vacuum infiltration methods described in detail by Bechtold et al. ((1993) 
C.R. Acad. Sci. Paris 316, 11 94-1 199) were used to Meet Arabidopsis thaliana 

1 5 (ecotype Wassilewskija) with the binary vectors described above. Five to ten plants 
were grown for 3-5 weeks in pots covered by a fine nylon screen stretched across the 
top of the pot at the time of seeding to prevent loss of soil. A suspension of 
Agrobacterium that had previously been grown overnight in LB containing kanamycin 
(25 ug/mL), rifampicin (50 ug/mL) and carbenicillin (100 ug/mL) was dispersed in 

20 1 liter of infiltration medium to give an ODgQO of 0.8. The bacterial suspension was 

poured into a tray that was placed into the bottom of the vacuum cabinet. The pots were 
suspended inverted in the vacuum cabinet and the shoots of the plants submerged in the 
solution. The door was closed and the vacuum of a rotary vane oil piraip applied for 
5 min reaching a final vacuum of 1.5-2.0 Torr. The plants (Tl generation) were 

25 removed and allowed to grow normally and set seed (4 weeks). 

The T2 seeds firom this Tl generation were harvested and selected for kanamycin 
resistance. This selection entailed sterilization of T2 seeds in 50% commercial bleach 
with 0.02% Tween-20 for 8-10 minutes and then washing in sterile water 3-5 times 
before sowing. Sterilized seeds were germinated in Petri plates with sterile media 

30 containing 1/2 strength Murashige-Skoog salts (Gibco #1111 7-066) plus 0.7% agar, 
1% sucrose and 50 ug/mL kanamycin. Kanamycin was prepared as a 50 mg/mL stock 
in water, sterilized by passage through a 0.2 um filter, and added to the media after it 
had been autoclaved and cooled to 60°. Plates containing kanamycin were stored at 4° 
and used within one month of preparation. Those plants that had been successfiiUy 

35 infected by Agrobacterium containing the selection cassette which encompasses the 
Brazil Nut 2S albumin constructs grew in preference to those without integrated DNA. 
The plants that survived this selection were transferred from the Petri plates to pots 
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containing commercial soil mixes (Metromix™ or others) at 1-3 weeks of age. 
- Following transplanting, the pots were covered with clear plastic wrap for 3-7 days to 
allow the seedlings to adapt to the soil conditions. The plastic was then removed and 
plants were grown to maturity using standard practices in growth chambers at 20-25° 
5 with fluorescent and incandescent illumination of 1 00-300 umol/m^/sec 

photosynthetically active radiation and a photoperiod ranging from 12 h to continuous 
illumination. The T2 plants in soil were allowed to self-fertilize to produce the T3 
seeds which were harvested and used for analysis. 

To ensure successful integration of the Brazil Nut 2S albumin gene into the 

1 0 nuclear genome of this selected group of individual plants, total DNA from the leaves 
was isolated and PGR used to amplify the Brazil Nut 2S albumin genes and verify that 
they had also been integrated along with the kanamycin gene. The DNA fragment that 
resulted from the PGR reactions was further analyzed and confirmed by restriction 
analysis. The results were compared to the individual examples of the host plant that 

1 5 had not been subjected to infection. Once the plants had fully matured, the DNA from 
seeds was likewise analyzed. 

EXAMPLE 9 

Expression of the Brazil Nut 2S Albumin Gene in Transformed Plants 
The seeds from different lines of T2 generation transgenics harboring the 
20 AT2S1BN15, AT2S1BN19 and AT2SlBNwt genes were harvested for analysis. The 
seeds (10 mg) from the mature plants were fu-st ground to a fine powder in liquid 
nitrogen and then defatted at room temp with three washes of «-hexane. The resulting 
defatted flour was allowed to dry before extraction with a weak acidic buffer 
(O.IM citrate, pH 5.0) to solubilize the 2S proteins; the precipitate of other proteins 
25 removed by centrifugation. The acid extract was filtered using Microcon™ 0.2 um 
filfration units (Amicon Inc., Beverly, MA) and samples of the extracts from the 
transgenics were subjected to amino acid analysis and compared with the 2S albumin 
exfracted from unfransformed Arabidopsis (see Table 1). 

The seeds (T3) from those lines in the T2 generation that showed the most 
30 increased Met and Lys content of the 2S fractions were sown to provide a T4 set of 
seeds for analysis. The seeds were treated as for the previoiis generation to obtain the 
2S protein for analysis and the results shown in Table 1. 

Table 1 shows the percent by weight of Met and Lys in the 2S fraction of 
untransformed and transgenic Arabidopsis seeds of the T3 and T4 generations. The 
35 percent by weight of Arg was included to indicate that concomitant decrease was 

observed with Lys increase, as expected. The percent by weight of valine (Val) is also 
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reported, a residue not present in the Brazil Nut 2S albumin, thus providing an internal 
- reference for comparison of the various 2S extractions. 



Table 1 

The weight percent of Met and Lys in the 2S fraction of 
Arabidopsis transgenics producing the modified Brazil Nut 2S albumin 



Arabidopsis variant 


Seed 
Generation 


%Met 


%Lys 


%Arfi 


%Val 


C24 


T3 


2.2 


7.9 


9.8 


2.7 


BNwt 


T3 


3.6 


7.3 


10.3 


2.6 


BN15 


T3 


4.5 


9.6 


8.7 


2.6 


BN19 


T3 


4.5 


10.4 


8.2 


2.8 


BNwt 


T4 


6.7 


7.5 


11.6 


3.3 


BNwt 


T4 


6.5 


7.4 


11.7 


3.3 


BN15 


T4 


8.7 


11.8 


7.6 


3.2 


BN15 


T4 


10.0 


13.1 


7.5 


3.2 


BN19 


T4 


8.0 


12.5 


6.8 


3.2 


BN19 


T4 


8.0 


12.5 


7.2 


3.1 


BN19 


T4 


10.3 


14.3 


6.4 


2.7 


BN19 


T4 


10.4 


14.4 


6.3 


2.7 


BN19 


T4 


7.1 


12.1 


8.0 


3.3 


BN19 


T4 


7.0 


11.9 


8.0 


3.3 



5 

EXAMPLE 10 
Expression of the Brazil Nut 2S Albumin Gene in E. coli 
The vectors pETBNwt, pETBNlS, pETBN16, pETBN17, pETBNlS and 
pETBN19 were used to transform E. coli (BL21) cells and grown in LB medium with 
10 kanamycin (30 ug/mL) selection in 50 mL shake cultures at 37° overnight on an 

incubated shaker (300 rpm). The next day, the cells were harvested to make glycerol 
stocks for long term storage and 1 mL was used to inoculate a fresh 50 mL batch of 
medium with the same selection. When the cells had reached an ODgoo of 0.9, protein 
expression was induced with 1 mM IPTG. The cells were harvested after overnight 
1 5 incubation on the shaker at 3 7° . 

The protein content of the cells was analyzed by incubating a fraction of the cell 
paste that had been washed with O.IM tris CI buffer (pH 8.0) at 100° in a gel SDS 
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loading buffer. Samples of the lysed cell extract were then run on a 18% SDS 
- polyacrylamide gel. After the gel had been run it was allowed to wash in 0. 1 M CAPS 
buffer, pH 10, to remove the Tris-Glycine gel running buffer. The proteins were 
transferred to PVDF membranes using electrophoretic transblotting procedures and 
5 visualized by coomassie blue staining. Those bands with the mobility of the 2S storage 
protein were identified as the recombinant product by N-termmal sequence and Western 
analyses. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 



(i) APPLICANT: 

(A) ADDRESSEE: E. I. DU PONT DE NEMOURS AND COMPANY 

(B) STREET: 1007 MARKET STREET 

(C) CITY: WILMINGTON 

(D) STATE: DELAWARE 

(E) COUNTRY: UNITED STATES OF AMERICA 

(F) ZIP: 19898 

(G) TELEPHONE: 302-992-5481 

(H) TELEFAX: 302-773-0164 

(I) TELEX: 6717325 



(ii) TITLE OF INVENTION: AN ENGINEERED HIGH SULFUR 

CONTAINING SEED PROTEIN CONTAINING 
OTHER ESSENTIAL AMINO ACIDS 



(iii) NUMBER OF SEQUENCES: 36 



(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: DISKETTE, 5.0 INCH 

(B) COMPUTER: IBM PC COMPATIBLE 

(C) OPERATING SYSTEM: MICROSOFT WINDOWS 95 

(D) SOFTWARE: MICROSOFT WORD VERSION 7 . OA 



CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 



PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/042,827 

(B) FILING DATE: APRIL 8, 1997 

ATTORNEY/AGENT INFORMATION: 

(A) NAME: CHRISTENBURY, LYNNE M. 

(B) REGISTRATION NUMBER: 30,971 

(C) REFERENCE /DOCKET NUMBER: BB-1069 
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(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 93 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 14.. 346 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

GGGGAACCTT CAT ATG CAG GAG GAG TGT CGC GAG CAG ATG CAG AGA CAG 
Met Gin Glu Glu Cys Arg Glu Gin Met Gin Arg Gin 



CAG ATG CTC AGC CAC TGC CGG ATG TAC ATG AGA CAG CAG ATG GAG GAG 
Gin Met Leu Ser His Cys Arg Met Tyr Met Arg Gin Gin Met Glu Glu 
15 20 25 

AGC CCG TAC CAG ACC ATG CCC AGG CGG GGA ATG GAG CCG CAC ATG AGC 
Ser Pro Tyr Gin Thr Met Pro Arg Arg Gly Met Glu Pro His Met Ser 
30 35 40 

GAG TGC TGC GAG CAG CTG GAG GGG ATG GAC GAG AGC TGC AGA TGC GAA 
Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys Arg Cys Glu 
45 50 55 60 

GGC TTA AGG ATG ATG ATG ATG AGG ATG CAA CAG GAG GAG ATG CAA CCC 
Gly Leu Arg Met Met Met Met Arg Met Gin Gin Glu Glu Met Gin Pro 



GGA GGG GAG CAG ATG CGA AGG ATG ATG AGG CTG GCC GAG AAT ATC CCT 
Arg Gly Glu Gin Met Arg Arg Met Met Arg Leu Ala Glu Asn He Pro 
80 85 90 

TCC CGC TGC AAC CTC AGT CCC ATG AGA TGC CCC ATG GGT GGC TCC ATT 
Ser Arg Cys Asn Leu Ser Pro Met Arg Cys Pro Met Gly Gly Ser He 
95 100 105 

GCC GGG TTC TGAATCTGCC ACTAGCCAGT GCTGTAAATG TTAATAAGGC 
Ala Gly Phe 
110 

TCTCACAAAC TAGCTCTTTG TTGGCTTTTG GCCGGAGACT AGGGTGTGGG GAATTCGAGC 
TCGGTACCCG GGGATCCTCT AGAGTCGACC TGCAGGCATG CAAGCTT 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 111 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Gin Glu Glu Cys Arg Glu Gin Met Gin Arg Gin Gin Met Leu Ser 
15 10 15 

His Cys Arg Met Tyr Met Arg Gin Gin Met Glu Glu Ser Pro Tyr Gin 
20 25 30 

Thr Met Pro Arg Arg Gly Met Glu Pro His Met Ser Glu Cys Cys Glu 
35 40 45 

Gin Leu Glu Gly Met Asp Glu Ser Cys Arg Cys Glu Gly Leu Arg Met 
50 55 60 

Met Met Met Arg Met Gin Gin Glu Glu Met Gin Pro Arg Gly Glu Gin 
65 70 75 80 

Met Arg Arg Met Met Arg Leu Ala Glu Asn lie Pro Ser Arg Cys Asn 
85 90 95 

Leu Ser Pro Met Arg Cys Pro Met Gly Gly Ser lie Ala Gly Phe 
100 105 110 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 

CCAGACCATG CCGCGGAAGG GAATGGAG 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GAGCTGCAAA TGCGAAGGCC TAAAGATGAT GATG 34 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GGGAGCAGAT GAAAAAGATG ATGAAGCTAG CCGAGAATA 39 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GTCCCATGAA ATGCCCCTTC GGTGGATCGA TTGCCGGG 38 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 336 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..333 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATG CAG GAG GAG TGT CGC GAG CAG ATG CAG AGA CAG CAG ATG CTC AGC 48 
Met Gin Glu Glu Cys Arg Glu Gin Met Gin Arg Gin Gin Met Leu Ser 
1 5 10 15 
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CAC TGC CGG ATG TAG ATG AGA GAG GAG ATG GAG GAG AGC CCG TAG CAG 
His Cys Arg Met Tyr Met Arg Gin Gin Met Glu Glu Ser Pro Tyr Gin 



AGO ATG CCG CGG AAG GGA ATG GAG CCG CAC ATG AGC GAG TGC TGC GAG 
Thr Met Pro Arg Lys Gly Met Glu Pro His Met Ser Glu Cys Cys Glu 
35 40 45 

CAG CTG GAG GGG ATG GAC GAG AGC TGC AAA TGC GAA GGC CTA AAG ATG 
Gin Leu Glu Gly Met Asp Glu Ser Cys Lys Cys Glu Gly Leu Lys Met 

50 55 60 

ATG ATG ATG AGG ATG CAA CAG GAG GAG ATG CAA CCG CGA GGG GAG CAG 
Met Met Met Arg Met Gin Gin Glu Glu Met Gin Pro Arg Gly Glu Gin 



ATG AAA AAG ATG ATG AAG CTA GCC GAG AAT ATG CCT TGC CGC TGC AAC 288 
Met Lys Lys Met Met Lys Leu Ala Glu Asn lie Pro Ser Arg Cys Asn 
85 90 95 

CTC AGT CCG ATG AAA TGC CCC TTC GGT GGA TCG ATT GCC GGG TTC 333 
Leu Ser Pro Met Lys Cys Pro Phe Gly Gly Ser lie Ala Gly Phe 
100 105 110 

TGA 336 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 131 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGGGAAGCTT CATATGCAGG AGGAGTGTAA AGAGCAGATG CAGAAACAGA AGATGCTCAG 60 
CCACTGCAAG ATGTACATGA AACAGCAGAT GGAGGAGAGC CCGTACCAGA GCATGCCGCG 120 
GAAGGGAATG G 131 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 131 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: other nucleic acid 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CCATTCCCTT CCGCGGCATG CTCTGGTACG GGCTCTCCTC CATCTGCTGT TTCATGTACA 60 
TCTTGCAGTG GCTGAGCATC TTCTGTTTCT GCATCTGCTC TTTACACTCC TCCTGCATAT 120 
GAAGCTTCCC C 131 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 93 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 14.. 346 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GGGGAACCTT CAT ATG CAG GAG GAG TGT AAA GAG CAG ATG CAG AAA CAG 4 9 

Met Gin Glu Glu Cys Lys Glu Gin Met Gin Lys Gin 



AAG ATG CTC AGC CAC TGC AAG ATG TAC ATG AAA CAG CAG ATG GAG GAG 
Lys Met Leu Ser His Cys Lys Met Tyr Met Lys Gin Gin Met Glu Glu 
15 20 25 

AGC CCG TAC CAG AGC ATG CCG CGG AAG GGA ATG GAG CCG CAC ATG AGC 
Ser Pro Tyr Gin Ser Met Pro Arg Lys Gly Met Glu Pro His Met Ser 
30 35 40 

GAG TGC TGC GAG CAG CTG GAG GGG ATG GAC GAG AGC TGC AAA TGC GAA 
Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys Lys Cys Glu 
45 50 55 60 

GGC CTA AAG ATG ATG ATG ATG AGG ATG CAA CAG GAG GAG ATG CAA CCC 
Gly Leu Lys Met Met Met Met Arg Met Gin Gin Glu Glu Met Gin Pro 



CGA GGG GAG CAG ATG AAA AAG ATG ATG AAG CTA GCC GAG AAT ATC CCT 
Arg Gly Glu Gin Met Lys Lys Met Met Lys Leu Ala Glu Asn lie Pro 
80 85 90 

TCC CGC TGC AAC CTC AGT CCC ATG AAA TGC CCC TTC GGT GGA TCG ATT 
Ser Arg Cys Asn Leu Ser Pro Met Lys Cys Pro Phe Gly Gly Ser lie 
95 100 105 

GCC GGG TTC TGAATCTGCC ACTAGCCAGT GCTGTAAATG TTAATAAGGC 
Ala Gly Phe 
110 
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TCTCACAAAC TAGCTCTTTG TTGGCTTTTG GCCGGAGACT AGGGTGTGGG GAATTCGAGC 44 6 
TCGGTACCCG GGGATCCTCT AGAGTCGACC TGCAGGCATG CAAGCTT 4 93 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 151 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GTACCAGAGC ATGCCGAAGA AGGGAATGGA GCCGCACATG AGCGAGTGCT GCGAGCAGCT 60 
GGAGGGGATG GACGAGAGCT GCAAATGCGA AGGCCTAAAG ATGATGATGA TGAAGATGCA 120 
ACAGGAGGAG ATGCAACCCA AGGGGGAGCA G 151 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 141 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GATGCAACCC AAGGGGGAGC AGATGAAAAA GATGATGAAG CTAGCCGAGA ATATCCCTTC 60 
CAAATGCAAC CTCAGTCCCA TGAAATGCCC CTTCGGTGGA TCGATTGCCG GGTTCTGAGG 120 
ATCCGAATTC AAGCTTGCGG C 141 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 141 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GCCGCAAGCT TGAATTCGGA TCCTCAGAAC CCGGCAATCG ATCCACCGAA GGGGCATTTC 60 
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ATGGGACTGA GGTTGCATTT GGAAGGGATA TTCTCGGCTA GCTTCATCAT CTTTTTCATC 120 
TGCTCCCCCT TGGGTTGCAT C 141 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 151 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CTGCTCCCCC TTGGGTTGCA TCTCCTCCTG TTGCATCTTC ATCATCATCA TCTTTAGGCC 60 
TTCGCATTTG CAGCTCTCGT CCATCCCCTC CAGCTGCTCG CAGCACTCGC TCATGTGCGG 120 

CTCCATTCCC TTCTTCGGCA TGCTCTGGTA C 151 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 367 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 14.. 34 6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GGGGAACCTT CAT ATG CAG GAG GAG TGT AAA GAG CAG ATG CAG AAA CAG 4 9 

Met Gin Glu Glu Cys Lys Glu Gin Met Gin Lys Gin 
15 10 

AAG ATG CTC AGC CAC TGC AAG ATG TAC ATG AAA CAG CAG ATG GAG GAG 97 
Lys Met Leu Ser His Cys Lys Met Tyr Met Lys Gin Gin Met Glu Glu 
15 20 25 

AGC CCG TAC CAG AGC ATG CCG AAG AAG GGA ATG GAG CCG CAC ATG AGC 145 
Ser Pro Tyr Gin Ser Met Pro Lys Lys Gly Met Glu Pro His Met Ser 
30 35 40 

GAG TGC TGC GAG CAG CTG GAG GGG ATG GAC GAG AGC TGC AAA TGC GAA 193 
Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys Lys Cys Glu 
45 50 55 60 
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GGC CTA AAG ATG ATG ATG ATG AAG ATG CAA CAG GAG GAG ATG CAA CCC 241 
Gly Leu Lys Met Met Met Met Lys Met Gin Gin Glu Glu Met Gin Pro 
65 70 75 

AAG GGG GAG CAG ATG AAA AAG ATG ATG AAG CTA GCC GAG AAT ATC CCT 289 
Lys Gly Glu Gin Met Lys Lys Met Met Lys Leu Ala Glu Asn lie Pro 
80 85 90 

TCC AAA TGC AAC CTC AGT CCC ATG AAA TGC CCC TTC GGT GGA TCG ATT 337 
Ser Lys Cys Asn Leu Ser Pro Met Lys Cys Pro Phe Gly Gly Ser lie 
95 100 105 

GCC GGG TTC TGAGGATCCG AATTCAAGCT T 367 
Ala Gly Phe 
110 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GATGATGAAG CTAGCCGAGA ATATCCCTTC CAAATGCAAC CTCAGTCCCA TGAAATGCCC 60 
CTTCAAAGGA AAGATTGCCG GGTTCTGAGG ATCCGAATTC 100 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GAATTCGGAT CCTCAGAACC CGGCAATCTT TCCTTTGAAG GGGCATTTCA TGGGACTGAG 60 
GTTGCATTTG GAAGGGATAT TCTCGGCTAG CTTCATCATC 100 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 367 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 14.. 34 6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

GGGGAACCTT CAT ATG CAG GAG GAG TGT AAA GAG CAG ATG CAG AAA CAG 
Met Gin Glu Glu Cys Lys Glu Gin Met Gin Lys Gin 
1 5 10 

AAG ATG CTC AGC CAC TGC AAG ATG TAC ATG AAA CAG CAG ATG GAG GAG 
Lys Met Leu Ser His Cys Lys Met Tyr Met Lys Gin Gin Met Glu Glu 
15 20 25 

AGC CCG TAC CAG AGC ATG CCG AAG AAG GGA ATG GAG CCG CAC ATG AGC 
Ser Pro Tyr Gin Ser Met Pro Lys Lys Gly Met Glu Pro His Met Ser 



GAG TGC TGC GAG CAG CTG GAG GGG ATG GAC GAG AGC TGC AAA TGC GAA 
Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys Lys Cys Glu 



GGCCTA AAG ATG ATG ATG ATG AAG ATG CAA CAG GAG GAG ATG CAA CCC 
Gly Leu Lys Met Met Met Met Lys Met Gin Gin Glu Glu Met Gin Pro 
65 70 75 

AAG GGG GAG CAG ATG AAA AAG ATG ATG AAG CTA GCC GAG AAT ATC CCT 
Lys Gly Glu Gin Met Lys Lys Met Met Lys Leu Ala Glu Asn lie Pro 
80 85 90 

TCC AAA TGC AAC CTC AGT CCC ATG AAA TGC CCC TTC AAA GGA AAG ATT 
Ser Lys Cys Asn Leu Ser Pro Met Lys Cys Pro Phe Lys Gly Lys lie 



GCC GGG TTC TGAGGATCCG AATTCAAGCT T 
Ala Gly Phe 
110 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 105 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



CCGTACCAGA GCATGCCGAA GAAGGGAATG GAGCCGCACA TGAAAGAGTG CTGCGAGCAG 60 
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CTGGAGGGGA TGGACGAGAG CTGCAAATGC GAAGGCCTAA AGATG 105 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 105 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
CATCTTTAGG CCTTCGCATT TGCAGCTCTC GTCCATCCCC TCCAGCTGCT CGCAGCACTC 60 
TTTCATGTGC GGCTCCATTC CCTTCTTCGG CATGCTCTGG TACGG 105 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 367 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 14.. 346 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

GGGGAACCTT CAT ATG CAG GAG GAG TGT AAA GAG CAG ATG CAG AAA CAG 4 9 

Met Gin Glu Glu Cys Lys Glu Gin Met Gin Lys Gin 
15 10 

AAG ATG CTC AGC CAC TGC AAG ATG TAC ATG AAA CAG CAG ATG GAG GAG 97 
Lys Met Leu Ser His Cys Lys Met Tyr Met Lys Gin Gin Met Glu Glu 
15 20 25 

AGC CCG TAC CAG AGC ATG CCG AAG AAG GGA ATG GAG CCG CAC ATG AAA 145 
Ser Pro Tyr Gin Ser Met Pro Lys Lys Gly Met Glu Pro His Met Lys 
30 35 40 

GAG TGC TGC GAG CAG CTG GAG GGG ATG GAC GAG AGC TGC AAA TGC GAA 193 
Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys Lys Cys Glu 
45 50 55 60 



GGC CTA AAG ATG ATG ATG ATG AAG ATG CAA CAG GAG GAG ATG CAA CCC 241 
Gly Leu Lys Met Met Met Met Lys Met Gin Gin Glu Glu Met Gin Pro 
65 70 75 
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AAG GGG GAG CAG ATG AAA AAG ATG ATG AAG CTA GCC GAG AAT ATC CCT 289 
Lys Gly Glu Gin Met Lys Lys Met Met Lys Leu Ala Glu Asn He Pro 
80 85 90 

TCC AAA TGC AAC CTC AGT CCC ATG AAA TGC CCC TTC AAA GGA AAG ATT 337 
Ser Lys Cys Asn Leu Ser Pro Met Lys Cys Pro Phe Lys Gly Lys He 
95 100 105 

GCC GGG TTC TGAGGATCCG AATTCAAGCT T 367 
Ala Gly Phe 
110 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 125 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GGAGATATAC ATATGCAGGA GAAGTGTAAA GAGCAGATGC AGAAACAGAA GATGCTCAGC 60 
CACTGCAAGA TGTACATGAA ACAGCAGATG ATGAAGAGCC CGTACCAGAG CATGCCGAAG 120 
AAGCC 125 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 125 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CCCTTCTTCG GCATGCTCTG GTACGGGCTC TTCATCATCT GCTGTTTCAT GTACATCTTG 60 
CAGTGGCTGA GCATCTTCTG TTTCTGCATC TGCTCTTTAC ACTTCTCCTG CATATGTATA 120 
TCTCC 125 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 367 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



wo 98/45458 



42 



PCT/US98/06673 



(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 14.. 346 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 



GGGGAACCTT CAT ATG CAG GAG AAG TGT AAA GAG CAG ATG CAG AAA CAG 49 
Met Gin Glu Lys Cys Lys Glu Gin Met Gin Lys Gin 
15 10 



AAG ATG CTC AGC CAC TGC AAG ATG TAC ATG AAA CAG CAG ATG ATG AAG 97 
Lys Met Leu Ser His Cys Lys Met Tyr Met Lys Gin Gin Met Met Lys 
15 20 25 



AGC CCG TAC CAG AGC ATG CCG AAG AAG GGA ATG GAG CCG CAC ATG AGC 145 
Ser Pro Tyr Gin Ser Met Pro Lys Lys Gly Met Glu Pro His Met Ser 
30 35 40 



GAG TGC TGC GAG CAG CTG GAG GGG ATG GAC GAG AGC TGC AAA TGC GAA 193 
Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys Lys Cys Glu 
45 50 55 60 

GGC CTA AAG ATG ATG ATG ATG AAG ATG CAA CAG GAG GAG ATG CAA CCC 241 
Gly Leu Lys Met Met Met Met Lys Met Gin Gin Glu Glu Met Gin Pro 
65 70 75 



AAG GGG GAG CAG ATG AAA AAG ATG ATG AAG CTA GCC GAG AAT ATC CCT 289 
Lys Gly Glu Gin Met Lys Lys Met Met Lys Leu Ala Glu Asn lie Pro 
80 85 90 



TCC AAA TGC AAC CTC AGT CCC ATG AAA TGC CCC TTC GGT GGA TCG ATT 337 
Ser Lys Cys Asn Leu Ser Pro Met Lys Cys Pro Phe Gly Gly Ser lie 
95 100 105 



GCC GGG TTC TGAGGATCCG AATTCAAGCT T 
Ala Gly Phe 
110 



(2) INFORMATION FOR SEQ ID NO: 25: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 124 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 



GCGTACGACA TATGCAGGAG AAGTGTAAAG AGCAGATGCA GAAACAGAAG ATGCTTAAGC 60 
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ACTGCAAGAT GTACATGAAA CAGCAGATGG AGGAGAGCCC GTACCAGAGC ATGCCGAAGA 120 
' AGGG 124 
(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 124 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CCCTTCTTCG GCATGCTCTG GTACGGGCTC TCCTCCATCT GCTGTTTCAT GTACATCTTG 60 
CAGTGCTTAA GCATCTTCTG TTTCTGCATC TGCTCTTTAC ACTTCTCCTG CATATGTCGT 120 
ACGC 124 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 108 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
CCGTACCAGA GCATGCCGAA GAAGGGAATG GAGAAGTGGA TGAGCGAGTG CTGCGAGCAG 60 
CTGGAGGGGA TGGACGAGAG CTGTAAATGC GAAGGCCTAA AGATGATG 108 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 108 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CATCATCTTT AGGCCTTCGC ATTTACAGCT CTCGTCCATC CCCTCGAGCT GCTCGCAGCA 60 
CTCGCTCATC CACTTCTCCA TTCCCTTCTT CGGCATGCTC TGGTACGG 108 
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(2) INFORMATION FOR SEQ ID NO: 29: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 367 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 14.. 346 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 



GGGGAACCTT CAT ATG CAG GAG AAG TGT AAA GAG CAG ATG CAG AAA CAG 4 9 

Met Gin Glu Lys Cys Lys Glu Gin Met Gin Lys Gin 
15 10 



AAG ATG CTT AAG CAC TGC AAG ATG TAC ATG AAA CAG CAG ATG GAG GAG 97 
Lys Met Leu Lys His Cys Lys Met Tyr Met Lys Gin Gin Met Glu Glu 

15 20 25 

AGC CCG TAC CAG AGC ATG CCG AAG AAG GGA ATG GAG AAG TGG ATG AGC 145 
Ser Pro Tyr Gin Ser Met Pro Lys Lys Gly Met Glu Lys Trp Met Ser 
30 35 40 



GAG TGC TGC GAG CAG CTC GAG GGG ATG GAC GAG AGC TGC AAA TGC GAA 193 
Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys Lys Cys Glu 
45 50 55 60 



GGC CTA AAG ATG ATG ATG ATG AAG ATG CAA CAG GAG GAG ATG CAA CCC 241 
Gly Leu Lys Met Met Met Met Lys Met Gin Gin Glu Glu Met Gin Pro 
65 70 75 

AAG GGG GAG CAG ATG AAA AAG ATG ATG AAG CTA GCC GAG AAT ATC CCT 28 9 
Lys Gly Glu Gin Met Lys Lys Met Met Lys Leu Ala Glu Asn lie Pro 
80 85 90 



TCC AAA TGC AAC CTC AGT CCC ATG AAA TGC CCC TTC GGT GGA TCG ATT 337 
Ser Lys Cys Asn Leu Ser Pro Met Lys Cys Pro Phe Gly Gly Ser lie 
95 100 105 



GCC GGG TTC TGAGGATCCG AATTCAAGCT T 
Ala Gly Phe 
110 



(2) INFORMATION FOR SEQ ID NO: 30: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 139 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
GCAGTCAGCA CCATGGCGAA CAAGCTCTTC CTCGTCTGTG CGGCACTTGC CCTCTGCTTC 60 
CTCCTTACGA ACGCGTCAAT TTACCGGACG GTCGTGGAGT TCGAGGAGGA CGACGCGACG 120 
AATCATATGC GTTACAGCG 139 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 139 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CGCTGTAACG CATATGATTC GTCGCGTCGT CCTCCTCGAA CTCCACGACC GTCCGGTAAA 60 
TTGACGCGTT CGTAAGGAGG AAGCAGACGG CAAGTGCCGC ACAGACGAGG AAGAGCTTGT 120 
TCGCCATGGT GCTGACTGC 139 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 470 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 3.. 449 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

CC ATG GCG AAC AAG CTC TTC CTC GTC TGT GCG GCA CTT GCC CTC TGC 47 
Met Ala Asn Lys Leu Phe Leu Val Cys Ala Ala Leu Ala Leu Cys 
15 10 15 

TTC CTC CTT ACG AAC GCG TCA ATT TAC CGG ACG GTC GTG GAG TTC GAG 95 
Phe Leu Leu Thr Asn Ala Ser lie Tyr Arg Thr Val Val Glu Phe Glu 
20 25 30 
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GAG GAG GAG GCG ACG AAT CAT ATG GAG GAG GAG TGT AAA GAG CAG ATG 143 
Glu Asp Asp Ala Thr Asn His Met Gin Glu Glu Cys Lys Glu Gin Met 
35 40 45 

CAG AAA CAG AAG ATG CTC AGC CAC TGC AAG ATG TAG ATG AAA CAG CAG 191 
Gin Lys Gin Lys Met Leu Ser His Cys Lys Met Tyr Met Lys Gin Gin 
50 55 60 

ATG GAG GAG AGC CCG TAG CAG AGC ATG CCC AAG AAG GGA ATG GAG CCG 239 
Met Glu Glu Ser Pro Tyr Gin Ser Met Pro Lys Lys Gly Met Glu Pro 
65 70 75 

CAC ATG AGC GAG TGC TGC GAG CAG CTG GAG GGG ATG GAG GAG AGC TGC 287 
His Met Ser Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys 
80 85 90 95 

AAA TGC GAA GGC CTA AAG ATG ATG ATG ATG AAG ATG CAA CAG GAG GAG 335 
Lys Cys Glu Gly Leu Lys Met Met Met Met Lys Met Gin Gin Glu Glu 
100 105 110 

ATG CAA CCC AAG GGG GAG CAG ATG AAA AAG ATG ATG AAG CTA GCC GAG 383 
Met Gin Pro Lys Gly Glu Gin Met Lys Lys Met Met Lys Leu Ala Glu 
115 120 125 

AAT ATC CCT TCC AAA TGC AAC CTC ACT CCC ATG AAA TGC CCC TTC GGT 431 
Asn lie Pro Ser Lys Cys Asn Leu Ser Pro Met Lys Cys Pro Phe Gly 
130 135 140 

GGA TCG ATT GCC GGG TTC TGAGGATCCG AATTCAAGCT T 470 
Gly Ser He Ala Gly Phe 
145 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 470 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 3.. 449 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

CC ATG GCG AAC AAG CTC TTC CTC GTC TGT GCG GCA CTT GCC CTC TGC 47 
Met Ala Asn Lys Leu Phe Leu Val Cys Ala Ala Leu Ala Leu Cys 
15 10 15 



TTC CTC CTT ACG AAC GCG TCA ATT TAG GGG ACG GTC GTG GAG TTC GAG 
Phe Leu Leu Thr Asn Ala Ser He Tyr Arg Thr Val Val Glu Phe Glu 
20 25 30 
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GAG GAC GAC GCG ACG AAT CAT ATG CAG GAG AAG TGT AAA GAG CAG ATG 143 
Glu Asp Asp Ala Thr Asn His Met Gin Glu Lys Cys Lys Glu Gin Met 
35 40 45 

CAG AAA CAG AAG ATG CTC AGC CAC TGC AAG ATG TAC ATG AAA CAG CAG 191 
Gin Lys Gin Lys Met Leu Ser His Cys Lys Met Tyr Met Lys Gin Gin 
50 55 60 

ATG ATG AAG AGC CCG TAC CAG AGC ATG CCC AAG AAG GGA ATG GAG CCG 239 
Met Met Lys Ser Pro Tyr Gin Ser Met Pro Lys Lys Gly Met Glu Pro 
65 70 75 

CAC ATG AGC GAG TGC TGC GAG CAG CTG GAG GGG ATG GAC GAG AGC TGC 287 
His Met Ser Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys 
80 85 90 95 

AAA TGC GAA GGC CTA AAG ATG ATG ATG ATG AAG ATG CAA CAG GAG GAG 335 
Lys Cys Glu Gly Leu Lys Met Met Met Met Lys Met Gin Gin Glu Glu 
100 105 110 

ATG CAA CCC AAG GGG GAG CAG ATG AAA AAG ATG ATG AAG CTA GCC GAG 383 
Met Gin Pro Lys Gly Glu Gin Met Lys Lys Met Met Lys Leu Ala Glu 
115 120 125 

AAT ATC CCT TCC AAA TGC AAC CTC AGT CCC ATG AAA TGC CCC TTC GGT 431 
Asn lie Pro Ser Lys Cys Asn Leu Ser Pro Met Lys Cys Pro Phe Gly 
130 135 140 

GGA TCG ATT GCC GGG TTC TGAGGATCCG AATTCAAGCT T 470 
Gly Ser He Ala Gly Phe 
145 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 120 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
GATGATGAAG CTAGCCTGGA ATATCCCTTG GAAATGCAAC CTCAGTCCCA TGAAATGCCC 60 
CTGGGGTGGA AAGATTGCCG GGTTCTGACC GCGGATCCGA ATTCAAGCTT ACGTAACGAC 120 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 120 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GTCGTTACGT AAGCTTGAAT TCGGATCCGC GGTCAGAACC CGGCAATCTT TCCACCCCAG 60 
GGGCATTTCA TGGGACTGAG GTTGCATTTC CAAGGGATAT TCCAGGCTAG CTTCATCATC 120 
(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 474 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 3.. 44 9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

CC ATG GCG AAC AAG CTC TTC CTC GTC TGT GCG GCA CTT GCC CTC TGC 47 
Met Ala Asn Lys Leu Phe Leu Val Cys Ala Ala Leu Ala Leu Cys 
15 10 15 

TTC CTC CTT ACG AAC GCG TCA ATT TAC CGG ACG GTC GTG GAG TTC GAG 95 
Phe Leu Leu Thr Asn Ala Ser lie Tyr Arg Thr Val Val Glu Phe Glu 
20 25 30 

GAG GAC GAC GCG ACG AAT CAT ATG CAG GAG GAG TGT AAA GAG CAG ATG 143 
Glu Asp Asp Ala Thr Asn His Met Gin Glu Glu Cys Lys Glu Gin Met 
35 40 45 

CAG AAA CAG AAG ATG CTC AGC CAC TGC AAG ATG TAC ATG AAA CAG CAG 191 
Gin Lys Gin Lys Met Leu Ser His Cys Lys Met Tyr Met Lys Gin Gin 
50 55 60 

ATG GAG GAG AGC CCG TAC CAG AGC ATG CCC AAG AAG GGA ATG GAG CCG 239 
Met Glu Glu Ser Pro Tyr Gin Ser Met Pro Lys Lys Gly Met Glu Pro 
65 70 75 

CAC ATG AGC GAG TGC TGC GAG CAG CTG GAG GGG ATG GAC GAG AGC TGC 287 
His Met Ser Glu Cys Cys Glu Gin Leu Glu Gly Met Asp Glu Ser Cys 
80 85 90 95 



AAA TGC GAA GGC CTA AAG ATG ATG ATG ATG AAG ATG CAA CAG GAG GAG 
Lys Cys Glu Gly Leu Lys Met Met Met Met Lys Met Gin Gin Glu Glu 
100 105 110 
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ATG CAA CCC AAG GGG GAG CAG ATG AAA AAG ATG ATG AAG CTA GCC TGG 383 
Met Gin Pro Lys Gly Glu Gin Met Lys Lys Met Met Lys Leu Ala Trp 
115 120 125 

AAT ATC CCT TGG AAA TGC AAC CTC AGT CCC ATG AAA TGC CCC TGG GGT 431 
Asn lie Pro Trp Lys Cys Asn Leu Ser Pro Met Lys Cys Pro Trp Gly 
130 135 140 

GGA AAG ATT GCC GGG TTC TGACCGCGGA TCCGAATTCA AGCTT 474 
Gly Lys lie Ala Gly Phe 
145 
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10 



5 




type protein; and 

(v) the modified protein comprises at least one non-conservative amino 
acid substitution not within the hypervariable loop, the substitution consisting of 
replacement of a non-essential amino acid with an essential amino acid. 



wherein: 

(i) the amino acid sequence of the modified protein is at least 82% 
homologous to the wild type Brazil Nut 2S albumin seed storage protein; 

(ii) all cysteine, proline, leucine and methionine residues of the modified 
20 protein are conserved relative to the wild type protein; 

(iii) all arginine residues of the wild type protein are substituted with 
lysine residues; 

(iv) the modified protein comprises at least three non-conservative amino 
acid substitutions not within the hypervariable loop, said substitutions comprising 

25 substituting two glutamic acid residues with lysine residues and substituting one 
glutamine residue v^th a lysine residue. 

3. The modified Brazil Nut 2S albumin seed storage protein of Claim 1 



wherein the protein is a member of the group consisting of BNCNSS, BNl 1, BN15, 
BN17, BN18, BN19, BN153KW, AT2S1BN15, AT2S1BN19 and AT2S1BN153W. 



albumin seed storage protein of Claim 1. 

5. The nucleic acid fragment of Claim 4 comprising a nucleotide sequence 
identical or substantially similar to a member selected from the group consisting of SEQ 
ID N0:7, SEQ ID NO:10, SEQ ID N0:15, SEQ ID N0:18, SEQ ID N0:21, SEQ ID 

35 NO:24, SEQ ID NO:29, SEQ ID NO:32, SEQ ID NO:33 and SEQ ID NO:36. 

6. The nucleic acid fi-agment of Claim 5 comprising a nucleotide sequence 
selected from the group consisting nucleotide sequences tiiat encode the amino acid 



15 



2. The modified Brazil Nut 2S albumin seed storage protein of Claim 



30 



4. An isolated nucleic acid fragment encoding the modified Brazil Nut 2S 
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sequence set forth in SEQ ID N0:7, SEQ ID NO: 10, SEQ ID NO: 15, SEQ ID NO: 18, 
- SEQ ID N0:21, SEQ ID NO:24, SEQ ID NO:29, SEQ ID NO:32, SEQ ID NO:33 or 
SEQIDNO:36. 

7. A chimeric gene comprising the nucleic acid fragment of Claim 4, Claim 5 
5 or Claim 6, operably linked to suitable regulatory sequences. 

8. A transformed plant comprising in its genome the chimeric gene of Claim 7. 

9. The transformed plant of Claim 8 wherein the plant is a member of the 
group consisting of soybean, rapeseed, simflower, cotton, com, tobacco, alfalfa, wheat, 
barley, oats, sorghum, rice and forage grasses. 

10 10. Seeds derived from the transformed plant of Claim 8 wherein the seeds 

comprise the chimeric gene. 

11. A method for increasing the essential amino acid content of seeds 
comprising: 

(a) preparing a nucleic acid fragment encoding a modified Brazil Nut 2S 
1 5 albumin seed storage protein wherein: 

(i) the amino acid sequence of the modified protein is at least 40% 
homologous to the wild type Brazil Nut 2S albumin seed storage protein; 

(ii) all cysteine residues of the modified protein are conserved 
relative to the wild type protein; 

20 (iii) at least 40% of proline residues are conserved relative to the 

wild type protein; 

(iv) at least 80% of leucine residues are conserved relative to the 
wild type protein; and 

(v) the modified protein comprises at least one non-conservative 
25 amino acid substitution not within the hypervariable loop, the substitution consisting of 

replacement of a non-essential amino acid with an essential amino acid; 

(b) preparing a chimeric gene comprising the nucleic acid fragment of 
step(a) operably linked to suitable regulatory sequences; 

(c) transforming a plant with the chimeric gene of step (b); and 
30 (d) obtaining seeds from the transformed plant of step (c). 
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